Yes, But…Hardware Fault Tolerance

“Therefore, by virtue of the authority vested in me by the Universita Committeeatum E Pluribus Unum, I hereby confer upon you the honorary degree of Th.D.,… er, ‘Doctor of Thinkology’.” —The Wizard of Oz

Most of our readers understand that for a Safety Instrumented Function—a SIF—to have a particular Safety Integrity Level—a SIL—its Average Probability of Failure on Demand—its PFD_AVG—must be low enough.

The committees that wrote the standards for Safety Instrumented Systems (SIS) also added another requirement: Hardware Fault Tolerance (HFT). It’s not enough to reach the Emerald City. The committees decided to set another task so that a SIF might prove itself worthy.

What value does HFT add? Whether it adds any value, it is a fact of the standards, so for some it is a requirement. For all of us, though, it is worth considering.

Achieving PFD_AVG

The most important feature of a SIF is its PFD_AVG. When a SIF works, it prevents a specific hazardous event from occurring. The lower the PFD_AVG, the more likely the SIF is to prevent that hazardous event. That is why the Risk Reduction Factor (RRF) is inversely proportional to PFD_AVG.

When a basic SIF design doesn’t provide enough risk reduction, that is, when its PFD_AVG is too high, one of the strategies to improve the risk reduction and lower the PFD_AVG is to test the SIF and its components more often. A SIF that is designed to be proof-tested every five years with a resulting PFD_AVG of 0.04 will see the PFD_AVG drop to 0.008 if the proof-test interval is reduced to once a year. Going from 0.04 to 0.008 can be the difference between SIL 1 and SIL 2.

Another strategy to improve the risk reduction of a SIF is to turn to redundancy. A basic SIF design that relies on a five-year proof-test interval and one-out-of-one (1oo1) architecture on each component with a resulting PFD_AVG of 0.04 will see the PFD_AVG drop to 0.002 when all of the components are converted to 1oo2 architectures, all while keeping the five-year proof-test interval.

The first strategy depends on testing more often; the second strategy depends on installing redundant equipment. Another term for redundancy is hardware fault tolerance. The basis for choosing one strategy over the other is cost. Which is less expensive, testing more often or buying and installing redundant equipment? Obviously, it will depend on the specific circumstances of the SIF.

Putting Their Thumb on the Scale

The committees that wrote the standard have expressed a preference for the buy-more-hardware strategy than the test-more-often strategy. “Note: Fault tolerance is the preferred solution to achieve the required confidence that a robust architecture has been achieved.”[1]

They’ve defined the requirements in tables. The most recent standard, the 2017 version of IEC 61511, includes this table:

Table 6 – Minimum HFT requirements according to SIL

SIL	Minimum required HFT
1 (any mode)	0
2 (low demand mode)	0
2 (high demand or continuous mode)	1
3 (any mode)	1
4 (any mode)	2

The higher the SIL, the greater the required HFT, regardless of the PFD_AVG the SIF design achieves. Fortunately, the new version is less insistent in its HFT demands, eliminating separate tables for programmable and non-programmable devices and eliminating easily manipulated constructs like Safe Failure Fraction. While the new version of IEC 61511 is less adamant than the earlier version of IEC 61511 and the current version of ISA S84, it still calls for minimum HFT requirements.

Typically, it is very difficult to achieve the necessary PFD_AVG without also satisfying the HFT requirements. However, what if someone could? A member of one of the committees once explained the HFT requirements this way: “Since PFD_AVG is a function of both failure rate and proof test interval, we didn’t want users thinking they could achieve a SIL by testing all the time.”

Why not?

What If Something Breaks?

No one really wants to test “all the time,” but if they were willing, why shouldn’t that be an option? Or what about a design that allows proof testing to be automated so that it is not difficult to “test all the time?” Should that option be precluded?

Members of the committees that wrote the SIS standards worried that something in a high integrity function would fail, rendering the function unable to work. “What if something breaks?” There is no “what if?” Everything breaks. Everything. The whole point of PFD_AVG calculations is to determine how often those failures occur. When the PFD_AVG is too high, the most common strategy is to use redundant architectures, such as 1oo2 or 2oo3,

Increasingly, components being used are so reliable that 1oo2 or 2oo3 architecture is not required to achieve the necessary PFD_AVG. The HFT requirement may compel it anyway, even if the component is as reliable as gravity. “What if it breaks?” “It’s gravity.” “Yes, but what if it breaks?”

The HFT necessary to meet the PFD_AVG requirement is really all the HFT that should be required.

What to do with HFT

For those of you that are bound to follow the older SIS standards to the letter, know that your skepticism about HFT is well-founded. The redundancy you need, given your intended proof-test interval, is what you need to achieve the necessary PFD_AVG. The rest is about getting a certificate. It is in your interest, however, to encourage the standards committees to address this issue in a future version.

Go ahead. Pay attention to the man behind the curtain.

As for those of you that are not bound to the older SIS standards, take comfort that for most SIFs you will encounter—SIL 1 and SIL 2 in low demand—the required minimum HFT is 0. If you get to a SIL 3 SIF without naturally providing an HFT of 1, that’s amazing and worth doing just what the standards require: preparing the “justification..to demonstrate that the proposed alternative architecture provides an equivalent or better solution…” including “using more reliable items of the same technology; changing for a more reliable technology.”

[1] IEC 61511-1, Functional safety – Safety instrumented systems for the process industry sector – Part 1: Framework, definitions, system, hardware and application programming requirements. Edition 2.1 2017-08. International Electrotechnical Commission. §11.4.6, p.55.

Author

Mike Schmidt

With a career in the CPI that began in 1977 with Union Carbide, Mike was profoundly impacted by the 1984 tragedy in Bhopal and has been working on process safety ever since.
View all posts