Challenges with Software Risk Analysis

ISO 14971 Risk Analysis

Identifying safety risks in medical devices is a challenging and laborious process. The process standard, ISO 14971, is a systematic, total product risk management lifecycle process to identify, control, and evaluate risk, where risk is defined as the combination of severity of the harm (to people, property, or environment) and probability of occurrence of the harm. 14971 defines a general philosophy and process framework but the manufacturer must define and implement a specific policy and procedure for the actual methods they will use to identify, control, and evaluate risks. Different organizations and companies use different methods and approaches to risk management while conforming to the standard. The effectiveness of these methods and approaches varies and can greatly depend upon the experience and background of the analysis team, the complexity and mix of mechanical, electrical, optics, and software that make up the system, and other factors.

As difficult and challenging as this system level risk analysis can be, many manufacturers experience additional difficulty performing effective software risk analysis or, in many cases, neglect software risk analysis altogether. This can be further exasperated for SaMD (Software as a Medical Device). Many of the common methods have advantages and disadvantages when compared. Below, we have outlined some comments on different methods and some advice on applying each.

Common Risk Analysis Methods

Bottom-up Analysis

When using bottom-up analysis methods one starts the process, as the name implies, at the “bottom” and works “up” from the lowest layers of design toward the high levels of the design. When applied to software, one might consider the bottom to represent drivers and other software interacting directly with hardware. For example, the analysis team should examine each failure mode of each driver to understand how failures would behave and the possible system/device level outcome. The team would consider common software design and coding failure modes such as aliasing errors, stale data errors, misconfiguring the hardware, not detecting hardware errors, etc. For SaMD products, the bottom might be analyzing Off-The-Shelf / SOUP software component failure modes to understand how those failures would manifest at the “device” level. One common bottom up approach is FMEA (Failure Modes and Effects Analysis).

Sound easy? Probably not. Typically this is is a very laborious and tedious process involving multiple experts. Often interaction between system components and other higher layers of architecture are missed or poorly understood in the analysis. This is exasperated by poorly architected systems with unneeded coupling and dependencies. Many times, the analysis team may “give up” as the system level outcomes are too difficult or seemingly impossible to understand. Schedule pressures can compound the frustration.

Top-down analysis

Another approach is to start from the “top” (highest level of the architecture) and work “downward” through the design layers. This method attempts to first define the failure outcomes linked to some unwanted situation. With medical devices using ISO 14971, one first defines the possible harm that can result from failure modes of the device and these become the “top” portion of the analysis. For example, one might identify “overdose” as the outcome of one or more possible failure modes.

Once the top level outcomes are identified, the analysis team works downward in the architecture and design identifying, in the “next layer down,” what situations could lead to the hazard in the layer above. This process is repeated layer-by-layer theoretically until all layers are analyzed. A common technique employing this method is Fault Tree Analysis.

Top-down methods do have some advantages. One, a top-down analysis can start early in the project once you know the intended use since the top-level hazards can be surmised from the intended use by Subject Matter Experts (SME). Two, often the analysis team will recognize that significant risk control measures may already be planned for a particular pathway in the fault tree and strongly mitigate the risk of failure via that pathway. That advantage is powerful – it can eliminate many hours of wasteful analysis so that the team can focus on more risky areas. Even with optimizations such as this, top-down methods will require time – but this time is valuable both for the outcome and the learning that occurs during the analysis.

One downside to top-down methods is that it may be hard to make it all the way to the “bottom” as some layers may not have clear connection with the bottom layer.

Risk Determination

Of course calculating risk from software failures is difficult using traditional “probability.” IEC/TR 80002-1 https://www.softwarecpr.com/2017/06/iec-tr-80002-2-validation-of-regulated-systems/ gives insight into the use of more qualitative approaches using “likelihood of harm” rather than a probability of the software failure. This approach views the entire system and the environment to consider other factors that may affect whether the software failure actually would result in a system failure. This allows the team to gain the benefit of prioritization of failure modes – which is the goal of risk analysis.

Challenges with Software Risk Analysis

ISO 14971 Risk Analysis

Common Risk Analysis Methods

Bottom-up Analysis

Top-down analysis

Risk Determination

Brian Pate

SoftwareCPR Training Courses

Corporate Office