{"title":"Fault, error, and failure","authors":"Mohammad Bakhtiari","doi":"10.1002/acm2.70106","DOIUrl":null,"url":null,"abstract":"<p>The sequence and terminology are crucial: cyclically, through causation, a fault <b>activates,</b> triggering an error. Error <b>propagates</b> as errors. When an error is observed in the external environment, it becomes a failure. This failure can subsequently <b>cause</b> a fault in the system it serves, and the cycle continues.</p><p>Understanding systems and their boundaries is essential because the definitions of fault, error, and failure can vary based on whether they are internal or external to a system. A system is an entity that interacts with other entities, including hardware, software, humans, and the physical world. It is composed of components, each of which can be another system, creating a recursive structure that stops when a component is considered atomic. The system boundary is crucial as it defines the common frontier between the system and its environment, determining the inputs and outputs of the system. A fault can occur within this boundary, which is the cause of an error. This error propagates through the system's internal states and may eventually become visible as a failure in the system's external state. This failure, perceived at the service interface, can act as an external fault for another system, initiating a new cycle of fault, error, and failure. The line between error detection and error observation is subtle but significant. While error detection refers to identifying discrepancies internally within a system, error observation occurs when these discrepancies manifest externally, leading to an observable failure state.</p><p>From a safety perspective, if humans are involved in the system, failures can be categorized as follows: If a failure does not affect the human, it is considered a near miss. If failure affects humans but causes no harm, it is classified as an incident. If failure results in harm, it is deemed an accident.<span><sup>5, 6</sup></span></p><p>Figure 1 provides a summary of these definitions. It illustrates the relationship between fault, error, and failure. The graph suggests that functional safety applies primarily to engineering or technical domains when no patient is involved. If a patient is engaged, functional safety falls under the broader category of patient safety.</p><p>The human factor covers the entire system and timeline. Human errors happen in just a short amount of time or an instance.<span><sup>7</sup></span></p><p>Figure 2 illustrates an example of a wrong couch density override to demonstrate that the fault creates an error, which propagates to the patient. If the error is observed, it becomes a failure and, depending on the beam intensity is categorized as a near miss, incident, or accident (adverse event). Detailed scenarios of the case are summarized in Table 1.</p><p>Daily QA trending serves as a helpful example. For instance, the beam output deviates from the baseline every day. These daily deviations are observed but are not termed errors or failures, as we define an error as any deviation beyond the 5% threshold. If the deviation exceeds 5%, it is an error until a physicist examines the monitor and identifies the error beyond 5%. At that moment, it is externally observed and termed as a failure. If physicists do not look at the monitor, the error remains in the system as an error and continues to propagate until it ultimately manifests as a failure in patient outcome observations.</p><p>Within our community, we often refer to “preventing error”.<span><sup>3</sup></span> As we discussed earlier, when we refer to an initial error, we mostly mean human error, where the root cause lies within the human factor, requiring the creation of conditions that minimize the likelihood of human error. What reports such as 394 refer to is what we call “preventing error propagation” here. Besides, “preventing error” carries a negative load and implies that errors are intrinsically evil. This is invalid in today's dynamic and adaptive systems. Errors are essential for a learning organization, as report 394 is indeed utilizing them for training and learning.<span><sup>8</sup></span> Modern terms such as “managing errors” seem more suitable.</p><p>The author was solely responsible for the conception, design, analysis, and writing of this manuscript.</p><p>The author declares no conflicts of interest.</p><p>This study did not involve human or animal subjects and, therefore did not require ethics approval.</p>","PeriodicalId":14989,"journal":{"name":"Journal of Applied Clinical Medical Physics","volume":"26 6","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/acm2.70106","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Clinical Medical Physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/acm2.70106","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
The sequence and terminology are crucial: cyclically, through causation, a fault activates, triggering an error. Error propagates as errors. When an error is observed in the external environment, it becomes a failure. This failure can subsequently cause a fault in the system it serves, and the cycle continues.
Understanding systems and their boundaries is essential because the definitions of fault, error, and failure can vary based on whether they are internal or external to a system. A system is an entity that interacts with other entities, including hardware, software, humans, and the physical world. It is composed of components, each of which can be another system, creating a recursive structure that stops when a component is considered atomic. The system boundary is crucial as it defines the common frontier between the system and its environment, determining the inputs and outputs of the system. A fault can occur within this boundary, which is the cause of an error. This error propagates through the system's internal states and may eventually become visible as a failure in the system's external state. This failure, perceived at the service interface, can act as an external fault for another system, initiating a new cycle of fault, error, and failure. The line between error detection and error observation is subtle but significant. While error detection refers to identifying discrepancies internally within a system, error observation occurs when these discrepancies manifest externally, leading to an observable failure state.
From a safety perspective, if humans are involved in the system, failures can be categorized as follows: If a failure does not affect the human, it is considered a near miss. If failure affects humans but causes no harm, it is classified as an incident. If failure results in harm, it is deemed an accident.5, 6
Figure 1 provides a summary of these definitions. It illustrates the relationship between fault, error, and failure. The graph suggests that functional safety applies primarily to engineering or technical domains when no patient is involved. If a patient is engaged, functional safety falls under the broader category of patient safety.
The human factor covers the entire system and timeline. Human errors happen in just a short amount of time or an instance.7
Figure 2 illustrates an example of a wrong couch density override to demonstrate that the fault creates an error, which propagates to the patient. If the error is observed, it becomes a failure and, depending on the beam intensity is categorized as a near miss, incident, or accident (adverse event). Detailed scenarios of the case are summarized in Table 1.
Daily QA trending serves as a helpful example. For instance, the beam output deviates from the baseline every day. These daily deviations are observed but are not termed errors or failures, as we define an error as any deviation beyond the 5% threshold. If the deviation exceeds 5%, it is an error until a physicist examines the monitor and identifies the error beyond 5%. At that moment, it is externally observed and termed as a failure. If physicists do not look at the monitor, the error remains in the system as an error and continues to propagate until it ultimately manifests as a failure in patient outcome observations.
Within our community, we often refer to “preventing error”.3 As we discussed earlier, when we refer to an initial error, we mostly mean human error, where the root cause lies within the human factor, requiring the creation of conditions that minimize the likelihood of human error. What reports such as 394 refer to is what we call “preventing error propagation” here. Besides, “preventing error” carries a negative load and implies that errors are intrinsically evil. This is invalid in today's dynamic and adaptive systems. Errors are essential for a learning organization, as report 394 is indeed utilizing them for training and learning.8 Modern terms such as “managing errors” seem more suitable.
The author was solely responsible for the conception, design, analysis, and writing of this manuscript.
The author declares no conflicts of interest.
This study did not involve human or animal subjects and, therefore did not require ethics approval.
期刊介绍:
Journal of Applied Clinical Medical Physics is an international Open Access publication dedicated to clinical medical physics. JACMP welcomes original contributions dealing with all aspects of medical physics from scientists working in the clinical medical physics around the world. JACMP accepts only online submission.
JACMP will publish:
-Original Contributions: Peer-reviewed, investigations that represent new and significant contributions to the field. Recommended word count: up to 7500.
-Review Articles: Reviews of major areas or sub-areas in the field of clinical medical physics. These articles may be of any length and are peer reviewed.
-Technical Notes: These should be no longer than 3000 words, including key references.
-Letters to the Editor: Comments on papers published in JACMP or on any other matters of interest to clinical medical physics. These should not be more than 1250 (including the literature) and their publication is only based on the decision of the editor, who occasionally asks experts on the merit of the contents.
-Book Reviews: The editorial office solicits Book Reviews.
-Announcements of Forthcoming Meetings: The Editor may provide notice of forthcoming meetings, course offerings, and other events relevant to clinical medical physics.
-Parallel Opposed Editorial: We welcome topics relevant to clinical practice and medical physics profession. The contents can be controversial debate or opposed aspects of an issue. One author argues for the position and the other against. Each side of the debate contains an opening statement up to 800 words, followed by a rebuttal up to 500 words. Readers interested in participating in this series should contact the moderator with a proposed title and a short description of the topic