| .. _safety_overview: |
| |
| Zephyr Safety Overview |
| ######################## |
| |
| Introduction |
| ************ |
| |
| This document is the safety documentation providing an overview over the safety-relevant activities |
| and what the Zephyr Project and the Zephyr Safety Working Group / Committee try to achieve. |
| |
| This overview is provided for people who are interested in the functional safety development part |
| of the Zephyr RTOS and project members who want to contribute to the safety aspects of the |
| project. |
| |
| Overview |
| ******** |
| |
| In this section we give the reader an overview of what the general goal of the safety certification |
| is, what standard we aim to achieve and what quality standards and processes need to be implemented |
| to reach such a safety certification. |
| |
| Safety Document update |
| ********************** |
| |
| This document is a living document and may evolve over time as new requirements, guidelines, or |
| processes are introduced. |
| |
| #. Changes will be submitted from the interested party(ies) via pull requests to the Zephyr |
| documentation repository. |
| |
| #. The Zephyr Safety Committee will review these changes and provide feedback or acceptance of |
| the changes. |
| |
| #. Once accepted, these changes will become part of the document. |
| |
| General safety scope |
| ******************** |
| |
| The general scope of the Safety Committee is to achieve a certification for the `IEC 61508 |
| <https://en.wikipedia.org/wiki/IEC_61508>`__ standard and the Safety Integrity Level (SIL) 3 / |
| Systematic Capability (SC) 3 for a limited source scope (see certification scope TBD). Since the |
| code base is pre-existing, we use the route 3s/1s approach defined by the IEC 61508 standard. |
| |
| Route 3s |
| *Assessment of non-compliant development. Which is basically the route 1s with existing |
| sources.* |
| |
| Route 1s |
| *Compliant development. Compliance with the requirements of this standard for the avoidance and |
| control of systematic faults in software.* |
| |
| Summarization IEC 61508 standard |
| ================================ |
| |
| The IEC 61508 standard is a widely recognized international standard for functional safety of |
| electrical, electronic, and programmable electronic safety-related systems. Here's an overview of |
| some of the key safety aspects of the standard: |
| |
| #. **Hazard and Risk Analysis**: The IEC 61508 standard requires a thorough analysis of potential |
| hazards and risks associated with a system in order to determine the appropriate level of safety |
| measures needed to reduce those risks to acceptable levels. |
| |
| #. **Safety Integrity Level (SIL)**: The standard introduces the concept of Safety Integrity Level |
| (SIL) to classify the level of risk reduction required for each safety function. The higher the |
| SIL, the greater the level of risk reduction required. |
| |
| #. **System Design**: The IEC 61508 standard requires a systematic approach to system design that |
| includes the identification of safety requirements, the development of a safety plan, and the |
| use of appropriate safety techniques and measures to ensure that the system meets the required |
| SIL. |
| |
| #. **Verification and Validation**: The standard requires rigorous testing and evaluation of the |
| safety-related system to ensure that it meets the specified SIL and other safety requirements. |
| This includes verification of the system design, validation of the system's functionality, and |
| ongoing monitoring and maintenance of the system. |
| |
| #. **Documentation and Traceability**: The IEC 61508 standard requires a comprehensive |
| documentation process to ensure that all aspects of the safety-related system are fully |
| documented and that there is full traceability from the safety requirements to the final system |
| design and implementation. |
| |
| Overall, the IEC 61508 standard provides a framework for the design, development, and |
| implementation of safety-related systems that aims to reduce the risk of accidents and improve |
| overall safety. By following the standard, organizations can ensure that their safety-related |
| systems are designed and implemented to the highest level of safety integrity. |
| |
| Quality |
| ******* |
| |
| Quality is a mandatory expectation for software across the industry. The code base of the project |
| must achieve various software quality goals in order to be considered an auditable code base from a |
| safety perspective and to be usable for certification purposes. But software quality is not an |
| additional requirement caused by functional safety standards. Functional safety considers quality |
| as an existing pre-condition and therefore the "quality managed" status should be pursued for any |
| project regardless of the functional safety goals. The following list describes the quality goals |
| which need to be reached to achieve an auditable code base: |
| |
| 1. Basic software quality standards |
| |
| a. :ref:`coding_guidelines` (including: static code analysis, coding style, etc.) |
| b. Requirements and requirements tracing |
| c. Test coverage |
| |
| 2. Software architecture design principles |
| |
| a. Layered architecture model |
| b. Encapsulated components |
| c. Encapsulated single functionality (if not fitable and manageable in safety) |
| |
| Basic software quality standards - Safety view |
| ============================================== |
| |
| In this chapter the Safety Committee describes why they need the above listed quality goals as |
| pre-condition and what needs to be done to achieve an auditable code base from the safety |
| perspective. Generally speaking, it can be said that all of these quality measures regarding safety |
| are used to minimize the error rate during code development. |
| |
| Coding Guidelines |
| ----------------- |
| |
| The coding guidelines are the basis to a common understanding and a unified ruleset and development |
| style for industrial software products. For safety the coding guidelines are essential and have |
| another purpose beside the fact of a unified ruleset. It is also necessary to prove that the |
| developers follow a unified development style to prevent **systematic errors** in the process of |
| developing software and thus to minimize the overall **error rate** of the complete software |
| system. |
| |
| Also the **IEC 61508 standard** sets a pre-condition and recommendation towards the use of coding |
| standards / guidelines to reduce likelihood of errors. |
| |
| Requirements and requirements tracing |
| ------------------------------------- |
| |
| Requirements and requirement management are not only important for software development, but also |
| very important in terms of safety. On the one hand, this specifies and describes in detail and on a |
| technical level what the software should do, and on the other hand, it is an important and |
| necessary tool to verify whether the described functionality is implemented as expected. For this |
| purpose, tracing the requirements down to the code level is used. With the requirements management |
| and tracing in hand, it can now be verified whether the functionality has been tested and |
| implemented correctly, thus minimizing the systematic error rate. |
| |
| Also the IEC 61508 standard highly recommends (which is like a must-have for the certification) |
| requirements and requirements tracing. |
| |
| Test coverage |
| ------------- |
| |
| A high test coverage, in turn, is evidence of safety that the code conforms precisely to what it |
| was developed for and does not execute any unforeseen instructions. If the entire code is tested |
| and has a high (ideally 100%) test coverage, it has the additional advantage of quickly detecting |
| faulty changes and further minimizing the error rate. However, it must be noted that different |
| requirements apply to safety for test coverage, and various metrics must be considered, which are |
| prescribed by the IEC 61508 standard for the SIL 3 / SC3 target. The following must be fulfilled, |
| among other things: |
| |
| * Structural test coverage (entry points) 100% |
| * Structural test coverage (statements) 100% |
| * Structural test coverage (branches) 100% |
| |
| If the 100% cannot be reached (e.g. statement coverage of defensive code) that part needs to be |
| described and justified in the documentation. |
| |
| Software architecture design principles |
| ======================================= |
| |
| To create and maintain a structured software product it is also necessary to consider individual |
| software architecture designs and implement them in accordance with safety standards because some |
| designs and implementations are not reasonable in safety, so that the overall software and code |
| base can be used as auditable code. However, most of these software architecture designs have |
| already been implemented in the Zephyr project and need to be verified by the Safety Committee / |
| Safety Working Group and the safety architect. |
| |
| Layered architecture model |
| -------------------------- |
| |
| The **IEC 61508 standard** strongly recommends a modular approach to software architecture. This |
| approach has been pursued in the Zephyr project from the beginning with its layered architecture. |
| The idea behind this architecture is to organize modules or components with similar functionality |
| into layers. As a result, each layer can be assigned a specific role in the system. This model has |
| the advantage in safety that interfaces between different components and layers can be shown at a |
| very high level, and thus it can be determined which functionalities are safety-relevant and can be |
| limited. Furthermore, various analyses and documentations can be built on top of this architecture, |
| which are important for certification and the responsible certification body. |
| |
| Encapsulated components |
| ----------------------- |
| |
| Encapsulated components are an essential part of the architecture design for safety at this point. |
| The most important aspect is the separation of safety-relevant components from non-safety-relevant |
| components, including their associated interfaces. This ensures that the components have no |
| **repercussions** on other components. |
| |
| Encapsulated single functionality (if not reasonable and manageable in safety) |
| ------------------------------------------------------------------------------ |
| |
| Another requirement for the overall system and software environment is that individual |
| functionalities can be disabled within components. This is because if a function is absolutely |
| unacceptable for safety (e.g. complete dynamic memory management), then these individual |
| functionalities should be able to be turned off. The Zephyr Project already offers such a |
| possibility through the use of Kconfig and its flexible configurability. |
| |
| Processes and workflow |
| ********************** |
| |
| .. figure:: images/zephyr-safety-process.svg |
| :align: center |
| :alt: Safety process and workflow overview |
| :figclass: align-center |
| |
| Safety process and workflow overview |
| |
| The diagram describes the rough process defined by the Safety Committee to ensure safety in the |
| development of the Zephyr project. To ensure understanding, a few points need to be highlighted and |
| some details explained regarding the role of the safety architect and the role of the safety |
| committee in the whole process. The diagram only describes the paths that are possible when a |
| change is related to safety. |
| |
| #. On the main branch, the safety scope of the project should be identified, which typically |
| represents a small subset of the entire code base. This subset should then be made auditable |
| during normal development on “main”, which means that special attention is paid to quality goals |
| (`Quality`_) and safety processes within this scope. The Safety Architect works alongside the |
| Technical Steering Committee (TSC) in this area, monitoring the development process to ensure |
| that the architecture meets the safety requirements. |
| |
| #. At this point, the safety architect plays an increasingly important role. For PRs/issues that |
| fall within the safety scope, the safety architect should ideally be involved in the discussions |
| and decisions of minor changes in the safety scope to be able to react to safety-relevant |
| changes that are not conformant. If a pull request or issue introduces a significant and |
| influential change or improvement that requires extended discussion or decision-making, the |
| safety architect should bring it to the attention of the Safety Committee or the Technical |
| Steering Committee (TSC) as appropriate, so that they can make a decision on the best course of |
| action. |
| |
| #. This section describes the certification side. At this point, the code base has to be in an |
| "auditable" state, and ideally no further changes should be necessary or made to the code base. |
| There is still a path from the main branch to this area. This is needed in case a serious bug or |
| important change is found or implemented on the main branch in the safety scope, after the LTS |
| and the auditable branch were created. In this case, the Safety Committee, together with the |
| safety architect, must decide whether this bug fix or change should be integrated into the LTS |
| so that the bug fix or change could also be integrated into the auditable branch. This |
| integration can take three forms: First either as only a code change or second as only an update |
| to the safety documentation or third as both. |
| |
| #. This describes the necessary safety process required for certification itself. Here, the final |
| analyses, tests, and documents are created and conducted which must be created and conducted |
| during the certification, and which are prescribed by the certifying authority and the standard |
| being certified. If the certification body approves everything at this stage and the safety |
| process is completed, a safety release can be created and published. |
| |
| #. This transition from the auditable branch to the main branch should only occur in exceptional |
| circumstances, specifically when something has been identified during the certification process |
| that needs to be quickly adapted on the “auditable” branch in order to obtain certification. In |
| order to prevent this issue from arising again during the next certification, there needs to be |
| a path to merge these changes back into the main branch so that they are not lost, and to have |
| them ready for the next certification if necessary. |
| |
| .. important:: |
| Safety should not block the project and minimize the room to grow in any way. |
| |
| .. important:: |
| **TODO:** Find and define ways, guidelines and processes which minimally impact the daily work |
| of the maintainers, reviewers and contributors and also the safety architect itself. |
| But which are also suitable for safety. |