How can I determine if my design is safe enough? What’s wrong with Toyota and why isn’t it happening to all the car companies? Why is it safe to fly? Is it safe to drive a Toyota? GM? Ford? Others?
We get the term “Fly-By-Wire” (FBW) from aviation. FBW enables unstable airplanes like the B2 and F-117 (Stealth) to fly at all while seemingly flying like normal from the pilot’s controls. However, from it’s beginning there were problems with digital electronics becoming unpredictable. This has led to the common technologies in widespread use today. Digital electronic FBW control IS NOT SAFE by itself.
FBW control has been applied to autos now for about 20 years beginning most notably as safety enhancement, Anti-lock-Breaking-Systems (ABS). But it’s only recent that FBW is getting applied to safety critical aspects of automotive designs (breaking and acceleration (beyond the ABS enhancement)).
So what when wrong? Why the system fail? Simple:
1. Design Adequacy Failure – Toyota’s design has been proven unsafe
2. Business Management Failure – Mr Toyota (CEO) appears dishonest for profit
3. Regulatory Failure – NTSB has not yet used the approach they already have for aviation (under the FAA)
4. The congress and public so far appear not adequately informed. Safety is a lesser known engineering domain but not new or even unusual any more
How do we prevent similar in the future? Simple, leverage this overview into the political and regulatory environment to where it becomes common sense. Just like it is to reset your PC when it doesn’t work. Here we simply need regulation similar to the FAA’s DO-178B, and AC 23.1309-1D applied to the auto-industry.
The most common FBW vehicle systems in use today appear to be simplex systems (at least from the outside). This is immediately an attention grabber for anyone familiar common safety critical systems in use for common aviation and military systems. However, that alone does not mean the designs are unsafe. There are a wide range of techniques to achieve a desired safety:
· Flex Safe – shutting off function or bending out of the way (eg: thermal protection)
· Fail Safe – quit instead of full-throttle
· Safety override – like an emergency brake that also disables the throttle (allows an operator to take over)
· Redundancy – multiple designs work side-by-side so when one goes crazy the others take over. 3 or more work most reliably so that the errors are easily “out-voted”
· Fault Tolerance – like multiple computers so if one fails it continues working. Re-configuring and graceful degradation are forms of this.
· Graceful degradation – multiple elements when 1 fails the others still work (like a pixel on a TV, or cell in a radar)
· Fault Recovery – like a computer resets itself after voting or timing-checkers said it was failed.
· Fault Detection and Isolation – key for software action to prevent bad things. Watch-dog-timer is a great example.
· Design For Six Sigma (DFSS) – applicable to tolerance analysis and manufacturing capabilities to maintain set performance required for mission success. Notable also are common best practices applicable to achieving DO-178B and SEI/CMMI Certifications.
These techniques work and people stake their lives on them every day. Most notable is commercial aviation.
Single Event Upsets (SEUs) are a generalization of all the conditions which can cause memory bits to invert their value (0 becomes a 1 or 1 becomes a 0). Many studies have revealed that SEUs can be caused in many ways including:
· Electro-Magnetic Interference (EMI) (eg: lightning, static, power glitches)
· radiation (eg: space particles, muons)
· mechanical failure (of the electrical connections)
· and last but not least software design adequacy (includes failure to accommodate hardware errors)
We see SEUs all the time in our common electronics in harmless ways. For example, reset the PC. OK that one is also caused by software design errors such as “memory leaks”, not to mention mal-ware, and so-on. But it also highlights the role of software in digital systems to tolerate, induce, or recover from SEUs. Once errors occur software can minimize or exacerbate the effects. The FAA’s DO-178B, specifies a wide range of design techniques (best practices, inspection, analysis, test) that do minimize both occurrences and resulting effects. These do also increase the software development costs.
So how far must the design go? The guidelines are specified in the NTSB FAA AC 23.1309-1D. The approaches are well documented and straight forward but still represent enough complexity that there is both development cost and production cost associated with safety (like airbags add reasonable cost for big safety improvement). The analysis starts with realization of all possible causes of errors in a formal assessment called the Failure Modes Effects Criticality Analysis (FMECA). In cases where catastrophic loss of civilian life is at stake the threshold of proof is a probability of < 0.000,000,001 per flight hour.
The technical complexity of design for Safety does not mean commercial design has to take on too much either. Rather the critical things are optimally managed when adequately analyzed while developed. Existing products even get a legacy benefit of time-on-equipment test results (Maturity Models, etc.).
One key to developing effective designs is cost-oriented design decisions. There are many competing interests to prioritize. The design techniques to achieve safety levels all affect system costs (both purchase and maintenance). Short-sighted management CAN select unsafe designs while assuming risk. For example, if other cars use simplex systems for FBW in autos, does that mean they’re not safe either? NO! It only means that we ourselves don’t know the answer. It also implies a business ethics motive for auto-company leaders to learn up on this topic fast!
Ideally business management can effectively apply technology solutions to the design issues at hand with realistic math based analysis and design (beyond just putting pieces together). The cost of such should/would/is minimal compared to the larger potential losses such as Toyota faces today.
Fully redundant systems may provide the most robust solutions, but there are alot of fail-safe techniques that can be applied too. Brake-switch over-ride to throttles, e-brake over-rides, and other mitigation techniques can provide adequate cost solutions in the short to medium term and ultimately FMECA driven development will become a natural part of modern control system development. The issue alert for good business management is given by Toyota’s public experience.
The auto industry already has many regulatory-certification requirements for safety features like seat belts, crash-impact, and air-bags (+emissions, etc). These are regulations of the NTSB which also regulates aviation
FAA regulation DO-178B – Software Considerations in Airborne Systems and Equipment Certification
And AC 23.1309-1D specifies (in figure 2 page 21) these failure-rate limits for Category (life-critical) designs:
· Class I (Single Rotary Engine aircraft): 10-6
· Class IV (Typical Commuter Category): 10-9
Similar NTSB specifications for the automotive industry could help the automotive industry. It would also further spread the wisdom of the avionics industry into auto and beyond into other industries facing similar safety critical aspects of their design.
It IS possible that other auto companies have done adequate safety designs. However, none are claiming it in public which means they’ve more likely selected short sighted cost avoidance while assuming a probabilistic risk that may manifest eventually. Wise companies could get a jump on their analysis according to FAA standard practices to see where they stand before they have problems like Toyota does.
The option of a safety-over-ride for all vehicles was recently mentioned as an option being considering. That may keep someone safe in the short-term but I’d feel safer if an appropriate analysis were required like we have for safe airplanes. The NTSB already does regulate alot of safety aspects for automotive designs so a control-safety certification would only be a small addition. Right now there are a plethora of safety requirements for autos rather than the FAA approach so what may appear simple may not be so easy.
Can you send a letter to your representatives? Can you demand a good story on the topic from your new car dealer?
An old car without a reputation for such problems IS more likely to be safe than a brand new one without a FMECA based certification.
Many companies already advertize their safety (the white-tire-cartoon comes to mind). And there’s also many who link to the jet industry image (remember the fins on the old Caddies?). Your most recent airplane ride WAS controlled by a FADEC (Full Authority Digital Engine Control – a very common FBW that’s life critical).
Fly By Wire (FBW) technology is a good thing for aviation and will be for automotive development too. There is a time and place for business cost cutting as well as regulation. We need to regulate the right things here. So rather than an over-ride regulation I’m praying for something more lasting and specific to the public needs …
Get the real science into operation like we’ve already achieved via the FAA under the same NTSB.
Keywords: SOS, SOSE, System Of Systems, Family Of Systems, consulting, information, technology, service, network, computer, systems, engineer, and management, novel, patents, research.