“How can you trust a man who wears both a belt and suspenders? The man can’t even trust his own pants.” — Henry Fonda as Frank, in Once Upon a Time in the West
There’s a cliché in the movies that involves a paranoid urban apartment-dweller with a dozen or more locks, deadbolts, chains, and a security bar on their front door. When the bad guy shows up, though, he kicks in the hollow core door, snapping off the hinges held in place by 1” brass screws. Perhaps you’ve even been in such an apartment with this kind of “security”.
Someone thought, “well, if one lock is good, then two is better. And if two is good, then three is even better. And if three…” The next thing you know, there are locks and deadbolts all up and down the door, as many as there is room for. It looks silly. It is.
Common cause failure will always limit the effectiveness of redundant safety devices. Always. There comes a point at which extra devices simply don’t make things better.
Redundancy
Redundancy results when there is more than one feature serving the same purpose at the same point in the system. Two locks on the front door provide redundancy. A lock on the front door and another lock on the back door do not. When the feature is very reliable, and the consequences of its failure are minor, though, “redundant” is not a positive. “Belt-and-suspenders” is an epithet to describe someone who is too cautious. “Redundant” often means “superfluous” or “unnecessary”, which are never good.
That’s one reason we are so fond of the term “fault tolerant”. Fault tolerant is good, no matter what. When a feature is not reliable enough, especially in terms of the consequence severity from which it protects us, redundancy or fault tolerance become a very desirable characteristic. This is especially true when the failure of one feature is independent from the failure of a redundant feature.
But it also happens that the failure of one feature is not independent from the failure of a redundant feature. Both features fail because of a common cause. And if a common cause can lead to the simultaneous failure of two features serving the same purpose at the same point in the system, that common cause can lead to the simultaneous failure of three redundant features, or four, or forty.
So how do we account for common cause, and at what point does redundancy stop doing us any good?
Common Cause Failures
The beta model has proved helpful in accounting for common cause failures. Without even knowing what the common cause failures are, we can assign a fraction of failures to causes that would result in the simultaneous failure of all redundant devices. In safety, that fraction typically ranges from 1% to 10% for field devices, and 0.5% to 5% for logic solvers. That value is called β. According to IEC 61508-6, “Values of β lower than 0.5% for the logic subsystem and 1% for sensors would be difficult to justify.”
Considering the whole range of β and the failure rates typically associated with components in safety service, it quickly becomes apparent that redundancy stops providing any benefit after three devices.
Consider a system with an overall failure rate, λ, of 0.03 failures/year, a proof test interval, T, of 1 year, and a beta value, β, of 1%. The common-cause contribution to the overall average probability of failure on demand, PFDavg, is βλT/2, or 0.00015. This is the contribution to the overall PFDavg, regardless of the redundancy. It’s the probability the door comes off the hinges, regardless of how many locks there are. Watch what happens to the overall PFDavg as the redundancy increases, while the common cause PFDavg remains constant
- With one device, the overall PFDavg is 0.015000.
- With two devices, the overall PFDavg is 0.000444.
- With three devices, the overall PFDavg is 0.000157.
- With four devices, the overall PFDavg is 0.000150.
- With five devices, the overall PFDavg is 0.000150.
At three devices, the system is just about as good as it can get based on redundancy. More redundancy will reduce the independent contribution to PFDavg, but that will be swamped by the common cause contribution. So, two is better than one and three is better than two, but after three, you are just wasting your time.
This is true for any reasonable combination of λT and β. You can check it yourself. For the number of devices, n, the equation is
PFDavg = ((1 – β)λT)n / (n + 1) + βλT/2
Quadruple this or quintuple that? They may be good marketing, but they don’t really make you any safer.
Does More Than Three Devices Ever Make Sense?
There are perfectly good reasons to use more than three devices for safety. In a fluidized or packed-bed reactor, an array of several devices may be used to detect localized problems like hot spots. In that configuration, however, the devices are not redundant. Each is looking for a hot spot in a different place and each is necessary. They may all cause the same trip, so some will call this, for instance, a 1-out-of-20 trip. It’s not 1-out-of-20, though, because each is looking in a different place. Instead, it’s 20 individual one-out-of-one functions, each of which happens to do the same thing.
The same is true for an array of sensors arranged to detect a leak. If there are 12 sensors spaced around equipment containing a highly toxic gas, the 12 sensors don’t provide redundancy; they provide coverage. Each one is critical for the zone it’s there to cover.
Another reason for multiple devices is when it’s not a single measurement that matters, but a profile. A distillation column is a perfect example. There may be 9 temperature measurements up the side of the column, but there is only one profile. The vote is one-out-of-one. The fault tolerance will be based on a determination of how many measurements are necessary to determine the profile. It may be that every device is necessary, meaning no fault tolerance, or that one or two could fault, yet the system could still get a sufficiently accurate profile.
More Than Three Things to Do
The previous examples all spoke to more than three sensors. There are also perfectly good reasons to include more than three final control elements. One example is the case of a safety instrumented function that shuts off all feeds to a tank on high level in that tank. All feeds must be shut off to avoid a spill. If there are more than three feeds, then more than three shut-offs must be tripped.
That said, there would be an advantage to a common shut off that stops all flow. But wait! Wouldn’t that create a common cause failure? It would, but there would be only one failure to worry about. If a system has four feeds, each of which must be shut off and each with a separate means to shut off, the result is four times the probability of failure than if there was just one that needed to shut off. They aren’t independent, since all are necessary, just independent sources of failure.
Thinking About Installing a Redundant Device?
The next time you find yourself considering a “redundant” device, ask yourself why. There are perfectly good reasons to use more than three devices in a safety function, but redundancy is not one of them.