Failure Mode & Effects Analysis (FMEA) is a process where you consider what can go wrong, how likely that is to happen, how likely that a user will be effected by the problem, and the consequences. Used correctly, it is a valuable design tool that can save costly reiterations during the design cycle. Too often, it is just a document created to complete a checklist.
This is just a short discussion, and learning opportunity, to introduce you to the value of the FMEA and to help get you started. There are many good references ranging from Wikipedia, the SAE (Society of Automotive Engineers) standards, and ISO 14971 (Application of risk management to medical devices).
The first challenge is assembling a FMEA team to work together on the analysis process. Ideally, you would have individuals with technical knowledge ranging from product management, to product design and development, to industrial engineering. In a small company, this is just a few of you over coffee. When you start, you may be planning on contract manufacturing and have not yet identified who that will be. If you are using a product design and development consultancy check to see if the team members actually have such experience.
The FMEA process starts with carefully analyzing the system design, the nature of the components and materials used, and how the product is operated by the end-users. The objective is to try to uncover potential failure modes or reasonable misuses. Because of the variety of failure opportunities, it is important to ensure that the FMEA analysis team comes with diverse perspectives. Some are reluctant to list out all the failure modes lest they become disclosed in a liability case. Actually, the absence of this documentation is worse. Showing that you considered a failure is a much stronger case than not. Do not consider this legal advice, but as an industry best practice.
The FMEA then has three different considerations, or categories, that are weighted from 1 (least severe) to 10 (most severe) for each failure mode. These are multiplied together for a score, called a Risk Priority Number (RPN), of 1 to 1,000. As a rough rule of thumb, companies look at anything over 125 as needing immediate correction. But, that is subjective since most of these values are best estimates based on experience.
First, how severe is the consequence of the failure mode? A noisy fan may be a minor nuisance, but could rate low for an industrial user. A safety hazard that leads to a house fire is very severe. Personal injury, death, or a major regulatory violation are scored a 10. Reducing the severity of failure often requires engineering changes or design considerations for the product.
Second, how likely is a failure to happen? A 1:1000 probability of failure might not be too bad if you are only building 2 or 3. But, if you plan on selling 10,000 per month, such a probability of failure is noteworthy and may need changes to the design, or special treatment within the production environment. For medical devices, this likelihood of failure could be reduced through training requirements, or specific inclusions within the Instructions For Use (IFU) documentation that accompanies such products.
Third, how likely is the failure to be detected? A failure that leads to major problems may not be so bad if there is some indicator to alert the user. As automotive brake pads wear, there are strips of metal that make a loud chirping sound when pads approach their end-of-life. Losing braking functionality is severe, but the concern over such a failure mode is mitigated because the design incorporated a method of detection – which subsequently improved the detection score.
Finally, it is important to emphasize that the team needs to focus on the resulting Risk Priority Number (RPN), which is a multiplication of the above three scores. The priority efforts are for the high RPN failure modes; there is a decreasing return on team effort for low RPN failure modes, which can easily entrap the team into spending excessive time on. Hence – the team’s experience with FMEA analysis, combined with leadership, ensures efficient and effective efforts.
To help understand the rating, here is an example. A tank that is designed to hold toxic chemicals has a valve to connect service hoses to. The FMEA analysis team discovers a potential scenario where that valve could fail. The severity is deemed a 10, as a leak into the environment would be a regulatory violation (at the least). You could never reduce such a severity – barring warning users to not use the tank for toxic chemicals, or requiring a catch basin (deemed unreasonable, for just this case). The score for occurrence may be low if it is judged that the valve cannot ever leak under normal circumstances. However, what about the extreme circumstances? Consider if there is freezing that will prevent the valve from closing. Or, if it is known that the tank will be used with critical infrastructure, extreme natural events would have to be considered – such as earthquakes or hurricanes. For such an analysis scenario, the FMEA team may end-up recommending designing-in a redundant valve, to reduce the occurrence score (as two valves would have to fail). Next, perhaps the detection score is deemed high (unfavorable). To improve detection (reduce the rating), a sensor could be designed-in to detect if the first valve fails to close. The FMEA analysis could then cascade into further considerations for the safe operation of the second valve, common failures to both, or new failure scenarios introduced by other design changes. From this example, you can see there are many sub-level items below the specification that the system must meet environmental and safety regulations.
Don Herres is a Product Design Expert with decades of experience in electrical and mechatronic product development projects, a IEEE Senior Life Member, a mentor to many electrical engineers over the years and has led, and contributed to, countless FMEA analysis teams.