“To err is human, to forgive, divine.” — Alexander Pope
I was in a PHA once where the lone operator on the PHA team finally grew weary of the discussion of operator error. “Why is it that when you engineers make a mistake, you call it design error, but when I make a mistake, you call it an operator error and pin it straight to me?”
We Don’t Pay Our People to Make Mistakes
The subject of operator error doesn’t come up that often. During the design phase of a project, a process design includes whatever features it needs to work and attention focuses on keeping it working. Then, during the operation of that process, we understand that pieces of equipment may fail, typically at random, but we just expect operators to do the right thing. Sometimes we don’t even know what the right thing is, but still, that’s what we expect.
So, perhaps I shouldn’t have been surprised when a plant manager insisted, “Why are we even talking about operator error? We don’t pay our people to make mistakes.”
No one pays for people to make mistakes. That’s just a bonus. Everyone errs, it’s part of the human condition. When we hire people to do a job, we are hiring a predictable likelihood of error.
Operator Error in Process Safety Management
Good process safety management requires that we address operator error both before it happens, during the process hazard analysis (PHA), and after it happens, in the incident investigation. The OSHA standard on Process Safety Management, 29 CFR 1910.119, tosses out the phrase “human factors” once in the section on process hazard analysis, but otherwise the regulation relies entirely on training as the tool to address operator error. The implication then is that a perfectly trained operator will never err, that if we want to reduce the impact of operator errors on our process facilities, we just need to train our operators better.
Training is Necessary, But Not Sufficient
Of the three types of errors—lapses, mistakes, and violations—training will reduce the likelihood of mistakes, and to a certain extent, violations. Mistakes, which happen when an operator simply doesn’t know what they are supposed to do and so does the wrong thing, can only be addressed with training. A well-trained operator will not make mistakes—the errors that result from not knowing what to do—that either create a problem or mistakes that fail to correct a problem.
But not all errors are mistakes. There are also violations, which happen when the operator understands what they’ve been asked to do and chooses to do something else. Sometimes this is because what they’ve been asked to do doesn’t make sense. Training can help make sense of the instructions, and if the instructions are good, increase the likelihood that operators will follow the instructions. If the instructions are not good, the further training will serve to highlight this as well. At which point, the instructor is left with “Do it because I said so”, which will not address the tendency toward violations, or with fixing the instructions, which will.
And then there are lapses. These happen, despite knowing what to do, wanting to do it, and being able to do it. These happen because we’re human, and all the training in the world can’t fix that. A well-trained human is still a human, vulnerable to the lapses that are simply a feature of being human. A process design that is vulnerable to lapses is a process design that is vulnerable.
When Operator Errors are Really Design Errors
Nothing is perfect. No piece of equipment, no procedure, and no operator. When we recognize that a process is vulnerable to the imperfection of equipment, we invest in redundancy to deal with the inevitable equipment failures. We have surge tanks. We have double block valves. We have installed spare pumps. We know that there will be failures, and for those that are critical, we modify the design to deal with those inevitable failures.
For operators, though, we just expect them to be perfect. Anything less than perfection we blame on the operator. Not on the design. Not on the organizational structure. Not on staffing levels that set to support operations when nothing is going wrong.
If we really want to be less vulnerable to operator error, we need to understand the criticality of operator actions to the same extent that we understand equipment criticality. That, or accept the vulnerability as a condition of operating.
Understanding the Criticality of Operator Actions
The tools we have for understanding the criticality of operator actions are PHAs and incident investigations. In either, we have an opportunity for understanding but only if we are determined to understand. That means being more specific in our description of causes than “operator error”. What specific error are we concerned about? Under what circumstances would such an error be made? How frequently is there an opportunity to make that error? And most importantly, what safeguards can be put in place to make us less vulnerable to that specific error.
Paul Erlich once famously riffed on Pope’s line: “To err is human, but to really foul things up you need a computer.” Fine. Let that serve as a warning that automation, as useful as it is in addressing the potential for operator error, is also not perfect. But we still need to address operator error. To err is human, so to plan for it is our responsibility.
Great article I think we should all ask this question to our own processes.