A fail-safe system defaults to a state that causes no harm when a fault occurs.
The safe state is defined by the application. A barrier that defaults to closed is fail-safe. A barrier that defaults to open is not.
Design Principle
Fail-safe design requires identifying the least-dangerous state for every possible failure mode, then engineering the system so every fault drives toward that state.
2 approaches:
- Normally energised
The safe state requires power. Failure cuts power and the system moves to the unsafe state. Avoid. - Normally de-energised
The safe state is the unpowered state. Failure cuts power and the system defaults to safe. Preferred.
Fail-Stop
A fail-stop system halts completely when a fault is detected rather than continuing in a degraded or undefined state.
Stopping is preferable to degraded operation because:
- Wrong output is worse than no output
Downstream systems and human operators can handle absence of data. They cannot safely handle incorrect data presented as correct. - Corrupt state propagates
An MCU executing with corrupted memory or wrong sensor readings spreads errors through every system that depends on it. - Degraded behaviour is often undetectable
A system that appears to be running hides the fault. A stopped system makes the fault immediately visible and diagnosable. - Undefined states cause unpredictable physical effects
An actuator commanded by corrupted logic may drive a motor, open a valve, or apply a brake in an unintended way. The physical consequence of wrong action exceeds the consequence of no action in most safety-critical contexts.
Examples
Railway Crossing Example
Option A: warning light normally OFF. On failure, the light goes dark, which looks like no train is coming. Dangerous.
Option B: green light normally ON. On failure, the light goes dark, interpreted as unsafe. Safe.
Option B is fail-safe.
Headlight Flash Convention
International standard: flash means “I am stopping, you go.” If both drivers flash, both stop.
Opposite convention: flash means “I am coming.” If both flash, both proceed, risking collision.
The first convention is fail-safe; the second is not.