Root Cause Analysis (Overview)

Root cause analysis (RCA) is a class of problem solving methods aimed at identifying the root causes of problems or incidents. The practice of RCA is predicated on the belief that problems are best solved by attempting to correct or eliminate root causes, as opposed to merely addressing the immediately obvious symptoms. By directing corrective measures at root causes, it is hoped that the likelihood of problem recurrence will be minimized. However, it is recognized that complete prevention of recurrence by a single intervention is not always possible. Thus, RCA is often considered to be an iterative process, and is frequently viewed as a tool of continuous improvement.

Ishikawa fishbone-type cause-and-effect diagram
Image via Wikipedia

RCA, initially is a reactive method of problem detection and solving. This means that the analysis is done after an incident has occurred. By gaining expertise in RCA it becomes a pro-active method. This means that RCA is able to forecast the possibility of an incident even before it could occur. While one follows the other, RCA is a completely separate process to Incident Management.

Root cause analysis is not a single, sharply defined methodology; there are many different tools, processes, and philosophies of RCA in existence. However, most of these can be classed into five, very-broadly defined “schools” that are named here by their basic fields of origin: safety-based, production-based, process-based, failure-based, and systems-based.

  • Safety-based RCA descends from the fields of accident analysis and occupational safety and health.
  • Production-based RCA has its origins in the field of quality control for industrial manufacturing.
  • Process-based RCA is basically a follow-on to production-based RCA, but with a scope that has been expanded to include business processes.
  • Failure-based RCA is rooted in the practice of failure analysis as employed in engineering and maintenance.
  • Systems-based RCA has emerged as an amalgamation of the preceding schools, along with ideas taken from fields such as change management, risk management, and systems analysis.

Despite the seeming disparity in purpose and definition among the various schools of root cause analysis, there are some general principles that could be considered as universal. Similarly, it is possible to define a general process for performing RCA.

General principles of root cause analysis

  1. The primary aim of RCA is to identify the root cause of a problem in order to create effective corrective actions that will prevent that problem from ever re-occurring, otherwise known as the ‘100 year fix’.
  2. To be effective, RCA must be performed systematically as an investigation, with conclusions and the root cause backed up by documented evidence.
  3. There is always one true root cause for any given problem, the difficult part is having the stamina to reach it.
  4. To be effective the analysis must establish a sequence of events or timeline to understand the relationships between contributory factors, the root cause and the defined problem.
  5. Root cause analysis can help to transform an old culture that reacts to problems into a new culture that solves problems before they escalate but more importantly; reduces the instances of problems occurring over time within the environment where the RCA process is operated.

General process for performing and documenting an RCA-based Corrective Action

Notice that RCA (in steps 3, 4 and 5) forms the most critical part of successful corrective action, because it directs the corrective action at the true root cause of the problem. The root cause is secondary to the goal of prevention, but without knowing the root cause, we cannot determine what an effective corrective action for the defined problem will be.

  1. Define the problem.
  2. Gather data/evidence.
  3. Ask why and identify the true root cause associated with the defined problem.
  4. Identify corrective action(s) that will prevent recurrence of the problem (your 100 year fix).
  5. Identify effective solutions that prevent recurrence, are within your control, meet your goals and objectives and do not cause other problems.
  6. Implement the recommendations.
  7. Observe the recommended solutions to ensure effectiveness.
  8. Variability Reduction methodology for problem solving and problem avoidance.

Root cause analysis techniques

  • Barrier analysis – a technique often used in particularly in process industries. It is based on tracing energy flows, with a focus on barriers to those flows, to identify how and why the barriers did not prevent the energy flows from causing harm.
  • Bayesian inference
  • Causal factor tree analysis – a technique based on displaying causal factors in a tree-structure such that cause-effect dependencies are clearly identified.
  • Change analysis – an investigation technique often used for problems or accidents. It is based on comparing a situation that does not exhibit the problem to one that does, in order to identify the changes or differences that might explain why the problem occurred.
  • Current Reality Tree showing diagnosis of mult...
    Image via Wikipedia

    Current Reality Tree – A method developed by Eliahu M. Goldratt in his theory of constraints that guides an investigator to identify and relate all root causes using a cause-effect tree whose elements are bound by rules of logic (Categories of Legitimate Reservation). The CRT begins with a brief list of the undesirables things we see around us, and then guides us towards one or more root causes. This method is particularly powerful when the system is complex, there is no obvious link between the observed undesirable things, and a deep understanding of the root cause(s) is desired.

  • Failure mode and effects analysis
  • Fault tree analysis
  • 5 Whys
  • Ishikawa diagram, also known as the fishbone diagram or cause-and-effect diagram. The Ishikawa diagram is the preferred method for project managers for conducting RCA, mainly due to its simplicity, and the complexity of the rest of the methods[1].
  • Pareto analysis
  • RPR Problem Diagnosis – An ITIL-aligned method for diagnosing IT problems.

Common cause analysis (CCA) common modes analysis (CMA) are evolving engineering techniques for complex technical systems to determine if common root causes in hardware, software or highly integrated systems interaction may contribute to human error or improper operation of a system. Systems are analyzed for root causes and causal factors to determine probability of failure modes, fault modes, or common mode software faults due to escaped requirements. Also ensuring complete testing and verification are methods used for ensuring complex systems are designed with no common causes that cause severe hazards. Common cause analysis are sometimes required as part of the safety engineering tasks for theme parks, commercial/military aircraft, spacecraft, complex control systems, large electrical utility grids, nuclear power plants, automated industrial controls, medical devices or other safety safety-critical systems with complex functionality.

Basic elements of root cause

  • Materials
    • Defective raw material
    • Wrong type for job
    • Lack of raw material
  • Man Power
    • Inadequate capability
    • Lack of Knowledge
    • Lack of skill
    • Stress
    • Improper motivation
  • Machine / Equipment
    • Incorrect tool selection
    • Poor maintenance or design
    • Poor equipment or tool placement
    • Defective equipment or tool
  • Environment
    • Orderly workplace
    • Job design or layout of work
    • Surfaces poorly maintained
    • Physical demands of the task
    • Forces of nature
  • Management
    • No or poor management involvement
    • Inattention to task
    • Task hazards not guarded properly
    • Other (horseplay, inattention….)
    • Stress demands
    • Lack of Process
  • Methods
    • No or poor procedures
    • Practices are not the same as written procedures
    • Poor communication
  • Management system
    • Training or education lacking
    • Poor employee involvement
    • Poor recognition of hazard
    • Previously identified hazards were not eliminated

From many sources including Wikipedia, the free encyclopedia (16 Sept 2010)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s