HP0823--SF--Plant Safety and Environment

Make accident investigations water-tight

A. J. Khan, Contributing Author, Riyadh, Saudi Arabia

Accident investigations are to a safety manager as a stethoscope is to a doctor. These methodic, non-intrusive tools support a quick observation of system, deriving a conclusion with a quality that is proportional to the competence of the user. In the case of an investigation, what is different is the “approval committee” which has the authority to dissect the outcome, counter the narrative, challenge the status quo and, in all fairness, accept its own “management failure.”

Before delving further into the subject, the basics must be reviewed. According to Rasmussen,¹ accidents are caused by a loss of control of physical processes that can injure people and/or damage property and the environment. Management is interested in investigating an accident, primarily to prevent a recurrence, satisfy legal requirements and review existing barriers for further improvement leading to enhanced operational efficiency.^2,3However, not all investigations achieve the objective entirely.

Ferry highlights the criteria related to the usefulness of investigations.⁴ A successful investigation process should be (FIG. 1):

Understandable—The investigation output should be clear, concise and readily understandable.
Satisfying—The results should be satisfying for those who initialized the investigation and other individuals that demand results from the investigation.
Direct—The investigation process should provide results that do not require the collection of more data before the needed controls can be identified and changes made.

Khan Fig 01

This article is meant to provide guidance to senior managers and technical leaders who are “approvers” of investigation reports but lack the time (and prioritization) to go in-depth to find the real cause of incidents. While it is wishful to think that all investigations are conducted thoroughly, statistics have proven otherwise. Tell-tale signs that an investigation system is not living up to its true potential include:

Repeat incidents
Lack of synchronization between incident root causes and action items
No mention of human error in contributing or root causes
Recommendations are long pending, even for serious incidents
No mention of improvement in design standards or procedures
Team is unaware of the investigation outcome (after a short time) as to what happened or what failed
Lack of concurrence regarding identified root causes among various members of the same team
Each failed barrier is not reflected in the action plan.

The many factors that contribute to these symptoms can be summed up for easy understanding (FIG. 2) as:

Managerial—A lack of task prioritization; the team is severely overloaded; a lack of quantity and quality of provided resources; authoritarian leadership leading to mostly one-way communication; unrealistic deadlines; candid sharing of investigation outcome is discouraged; initiative overload; and fear of losing bonus in a key performance indicator (KPI)-driven performance reward structure.
Political—Task leaders are aware of political alliances; direct/indirect intervention from direct/functional leadership; selective treatment towards/against specific individuals having direct correlation with the incident; and team leader/ members’ lack of interest.
Technical—Lack of specific expertise required for an in-depth investigation (i.e., dispersion modeling, stress analysis, fitness-for-service assessments, complex interactions); lack of diversified experienced team members, including direct supervision staff; non-involvement of subject matter experts (SMEs); and lack of availability of previous design/maintenance/commissioning and root cause analysis (RCA) data, etc.
Resource constraints—Staff scarcity due to budget cuts; lack of preventive maintenance programs; outsourcing of important inspections without regard to competence; and a reluctance to accept responsibility of negative outcome of their own decisions.

Khan Fig 02

The role of leadership is crucial in terms of managing safety and staff motivation, especially during budget cuts and resource limitations. Any corporate push of “doing more with less” should be taken with a pinch of salt, since “quality never comes cheap.” Leaders must be aware that every action has an equal and opposite reaction—they might not hear this from direct reports, but business and safety KPIs will definitely provide strong indications of success or failure.

ACHIEVING BEST RESULTS

Maintaining a robust incident investigation system under such constraints is a challenge that leadership must face and conquer. Experience has proven that the best results are achieved with the steps detailed here.

Honest interaction with impacted employees. Leaders must meet the impacted unit’s operators at all levels, which can help them develop their own version of accident causation. Leaders must listen intently to employees’ feedback and their proposed solutions around any issues. Leaders should actively try to resist providing answers, but rather generate insights from the team’s feedback. They should not become “emotionally attached” to a certain contributing or root cause. There is no harm in tossing questions to the investigation team for their active debate and assessment, and employees appreciate personal feedback of their raised concerns and a public appreciation of good work.

Empowering the team. This is easier said than done, but the more empowered the team, the more agile it becomes. Decisions do not have to be escalated needlessly, leaving leaders free to focus on strategic tasks while functional leaders can manage the mundane. Most empowered teams demonstrate the following traits:

Mutual respect and trust
Ownership of resources
Focus on results
Ethical conduct in difficult situations.

Weekly updates from the investigation team. A short update about an investigation’s progress, the major issues being faced and any support requested from management can make all the difference between an excellent outcome and one that merely checks the box. Leadership must communicate their openness to listening to uncomfortable truths to successfully avoid issues in the long term, rather than a rosy lie leading to an embarrassing short-term outcome.

Audit the auditor. Insightful, open-ended questions to the investigation team will determine the depth to which the investigation team will reach and ascertain root causes. These questions can include:

Have you been able to identify and troubleshoot all failed barriers?
What were the missing barriers? How do similar benchmark companies manage this risk?
Have job-related factors been considered?
Can the same incident occur in another shift?
Were environment-related factors studied?
Was the team’s competence verified prior to assigning it individual responsibilities? What about a refresher training system?
If previous failures were investigated, what actions were taken on earlier recommendations?
Was the design correct to withstand both normal and abnormal conditions? Has it been reviewed by a competent team prior to commissioning and during the operations phase? What happened to the outcome generated as a result?
Were technical integrity-related factors verified (e.g., previous PMs/CMs, failure reporting)? What actions were taken?
Were operating integrity-related factors verified? Were alarms documented and personnel trained? Was equipment operated within design limits and were any deviations investigated? Were any alarms/interlocks/controller set points tampered with without proper risk assessment?

Drive the investigation’s quality by challenging the preliminary assessment report. A best practice is to invite relevant professionals from other divisions/assets who can assist in reviewing the preliminary report with a cold eye. The human mind is prone to many biases, so it is always beneficial to calibrate against a competent resource that has no stake in the outcome. The wealth of information expected from such an intervention includes knowledge sharing on missing barriers, insights on failed barriers and best practices around the subject matter. Encourage active investigation of recovery measures, in addition to preventive and mitigative barrier failures, thereby driving operational resilience.

If serious about safety, be authentic in accountability. Management’s focus should be to improve the process and not incite fear with disciplinary actions. Accountability must be conducted with the utmost responsibility. Best results are achieved when accountability is:

Endorsed by entire management team
Driven by an approved management system (e.g., consequence management system) rather than an opinion
Consistently applied to involved personnel, according to their proven contribution and regardless of their seniority/designation
Distributed equally to employees and business partners/contractors
Conveyed directly by supervisors/managers to employees.

Short-term impacts (e.g., verbal/written warning, safety gift deduction/presenting incident to shifts, time off from the core job while at the worksite) are preferred over long-term impacts (e.g., yearly appraisals, bonuses, career progression). The purpose is to drive behavioral improvements, not to drive the team member away. Remember, we are interested in keeping both the lesson and the person who learned it best within the organization.

Push your team to delve deeper and observe better. Some investigations are complex due to process design, unknown variables, network interconnections and a lack of awareness about certain mechanisms. Rather than closing the investigation within its stipulated time:

Extend the deadline
Provide expert resources
Benchmark with similar businesses/process designs
Discuss on professional networking sites (e.g., LinkedIn groups, industry conferences, peer discussions). It is amazing what can be learned from like-minded professionals.

The craft of preparing an action plan. While many professionals place a priority on determining root cause(s), it is the actions generated as a result that can mark the difference between success and failure. While reviewing the action plan:

Ensure the basics are in place before developing an effective control [i.e., management system/procedure, competence, monitoring mechanisms (preferably automated), and assurance mechanisms].
Never assume that an action party is aware of the recommendation’s context since it might get delegated lower. Document what you need from the action party with as much clarity as possible.
A failed barrier must receive improvements in design, operation and maintenance, with independent assurance mechanisms by different responsible parties.
Both contributory and root causes must be considered for active intervention since both active barriers and early warning systems are needed to prevent the next incident.
An action party should preferably have the resources to complete the task on its own. Otherwise, it should be delegated to the manager.
Incorporate the investigation team leader or a senior relevant advisor in the action closure loop in the incident database. Ensure the intent of action closure is achieved prior to closure approval.
Have the investigation team leader gain concurrence with the action owner. A signed final report means the end of further discussion.
Target dates are provided by the action party and are open to challenge only by executive management. Any target date extension should be approved by the investigation team sponsor or site VP, whichever is more senior.
Recommendations should be robust to identify and curb similar unsafe acts and conditions with similar chances of failures across other organizational units/assets.

Lead, follow or get out of the way. The management team should initiate a Q&A session with the investigation team and impacted unit personnel. Everyone should be given a chance to contribute. Once the report is approved, any question on the report’s credibility must be addressed.

View the investigation report as both a door to the past and a window to the future. While conducting the investigation, the team must ensure that relevant data, documentation, distributed control system (DCS) trends, logs, records, procedure/report excerpts, interview log, FTA/tripod beta tree, minutes of meetings, etc., are properly archived. Future colleagues and teams will benefit from this hard work while investigating a similar incident in future.

Integrate the investigation system with an effective lesson-learned system. Major investigations must be embedded in various platforms to maximize lessons learned. Best practices include developing a one-page flyer with every investigation to be shared to all employees in a safety meeting. A learning from incident (LFI) committee should review various incidents and develop proactive action plans to supplement existing controls across assets/divisions/businesses. Operations and maintenance personnel should incorporate important lessons learned in their work procedures and instruction manuals. An online LFI database and hard files are kept for new employees, turnarounds (TARs) and safety trainings. HP

LITERATURE CITED

Rasmussen, J., “Risk management in a dynamic society: A modelling problem,” Safety Science, Vol. 27, Elsevier Science Ltd., 1997.
Hagan, P., et al., “Accident prevention manual: Administration and programs,” 14th Ed., National Safety Council, 2015.
Sklet, S., “Methods for accident investigation,” Norwegian University of Science and Technology, 2002.
Ferry, T., Modern accident investigation and analysis, 2nd Ed., John Wiley and Sons, 2007.

First Author Rule Line

Author pic Khan

AAMISH J. KHAN is an Operational Safety Consultant who has been supporting various renowned companies in the oil and gas, petrochemical and utilities sectors in their safety culture enhancement journeys for two decades. He has a multifaceted exposure to operations leadership, occupational safety, process safety management (PSM), integrity assurance and audit, enabling him to identify, analyze and treat risk effectively throughout an asset’s lifecycle. He is now involved in co-authoring the Center for Chemical Process Safety (CCPS) Safe Work Practices guidelines, with the objective of enhancing the sharing of lessons learned across the global industry and softening the safety impact on workers' lives. Khan is a graduate chemical engineer and holds an MS degree in enterprise risk management from Boston University in the U.S.