Reliability

4 min read Last updated Fri Jun 12 2026 01:59:21 GMT+0000 (Coordinated Universal Time)

A system is reliable when it rarely fails. No system can be made completely failure-free.

Metrics

MTTF

Mean Time To Failure. The statistical average time from first use until the first failure of a non-repairable component or system.

MTTF=0R(t)dt\text{MTTF} = \int_0^{\infty} R(t)\, \text{d}t

where R(t)R(t) is the reliability function (probability of surviving to time tt). For a constant failure rate λ\lambda:

MTTF=1λ\text{MTTF} = \frac{1}{\lambda}

Used for components that are discarded on failure: sensors, MCUs, batteries, LEDs.

MTTR

Mean Time To Repair. Considered for repairable systems such as servers and industrial machines.

MTBF

Mean Time Between Failures. The statistical average time between successive failures of a repairable system.

MTBF=MTTF+MTTR\text{MTBF} = \text{MTTF} + \text{MTTR}

For systems where repair time is negligible compared to operating time, MTBF \approx MTTF.

Used for systems that are repaired and returned to service: servers, industrial machines, repairable IoT gateways.

Number of Components vs. Metrics

When nn independent components are connected in series (the system fails if any one fails), their failure rates add:

λsystem=λ1+λ2++λn\lambda_{system} = \lambda_1 + \lambda_2 + \cdots + \lambda_n

Since MTTF=1/λ\text{MTTF} = 1/\lambda, the system MTTF is:

1MTTFsystem=1MTTF1+1MTTF2++1MTTFn\frac{1}{\text{MTTF}_{system}} = \frac{1}{\text{MTTF}_1} + \frac{1}{\text{MTTF}_2} + \cdots + \frac{1}{\text{MTTF}_n}

MTTFsystem\text{MTTF}_\text{system} is always shorter than the shortest individual MTTF. Adding more components strictly decreases system reliability.

For nn identical components each with MTTF =M= M:

MTTFsystem=Mn\text{MTTF}_\text{system} = \frac{M}{n}

Doubling the component count halves the system MTTF.

For components in parallel (redundant: system survives until all fail), reliability improves. With nn identical components each with failure rate λ\lambda, the system survives until the last one fails:

MTTFparallel=1λ(1+12++1n)=1λk=1n1k\text{MTTF}_\text{parallel} = \frac{1}{\lambda} \left(1 + \frac{1}{2} + \cdots + \frac{1}{n}\right) = \frac{1}{\lambda} \sum_{k=1}^{n} \frac{1}{k}

For two identical components in parallel: MTTFparallel=32λ=1.5×MTTFsingle\text{MTTF}_\text{parallel} = \frac{3}{2\lambda} = 1.5 \times \text{MTTF}_\text{single}.

Component Count Rule

Every additional component added to an IoT design has a cost on multiple axes simultaneously. Minimising component count is therefore a primary design objective, not merely a cost-cutting measure.

Cost

  • Bill of materials
    Each component has a unit price. At volume, even a $0.05 passive adds thousands of dollars across a production run.
  • PCB area
    More components require a larger board or finer pitch routing. Both increase fabrication cost and constrain enclosure options.
  • Assembly
    Each component is a placement operation on the pick-and-place machine. More placements mean longer cycle time and higher assembly cost per unit.
  • Testing
    Every component is a potential failure point during manufacturing test. More components increase test fixture complexity and time.
  • Supply chain
    Each distinct part number is a procurement dependency. Shortages, end-of-life notices, or single-source suppliers create supply risk.

Points of Failure

Components in series multiply failure risk. If each component has reliability RiR_i (probability of surviving a given period), the system reliability is:

Rsystem=R1×R2××RnR_{system} = R_1 \times R_2 \times \cdots \times R_n

Each Ri<1R_i < 1, so RsystemR_{system} strictly decreases with every added component. Equivalently, system MTTF for series components satisfies:

1MTTFsystem=1MTTF1+1MTTF2+\frac{1}{\text{MTTF}_\text{system}} = \frac{1}{\text{MTTF}_1} + \frac{1}{\text{MTTF}_2} + \cdots

MTTFsystem\text{MTTF}_\text{system} is always shorter than the shortest individual MTTF.

IoT devices are often deployed in locations with no on-site maintenance. A field failure requires a technician visit or a full device recall. The cost of a single field failure (labour, logistics, customer impact) vastly exceeds the cost saved by adding the component in the first place.

Power Consumption

Every component draws idle current. In battery-powered IoT devices, the sum of all idle currents determines battery life. Removing one component that draws 10 μA10\ \mu\text{A} continuously can extend battery life by weeks in a low-duty-cycle device.

Design Complexity

More components introduce more potential interaction effects: coupling, noise, voltage drop, thermal interference. Each interaction is a debugging surface. Fewer components means fewer failure modes to analyse and certify.

Was this helpful?