Fault, error, and failure

IF 2 4区医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Applied Clinical Medical Physics Pub Date : 2025-04-21 DOI:10.1002/acm2.70106

Mohammad Bakhtiari

{"title":"Fault, error, and failure","authors":"Mohammad Bakhtiari","doi":"10.1002/acm2.70106","DOIUrl":null,"url":null,"abstract":"The sequence and terminology are crucial: cyclically, through causation, a fault activates, triggering an error. Error propagates as errors. When an error is observed in the external environment, it becomes a failure. This failure can subsequently cause a fault in the system it serves, and the cycle continues.Understanding systems and their boundaries is essential because the definitions of fault, error, and failure can vary based on whether they are internal or external to a system. A system is an entity that interacts with other entities, including hardware, software, humans, and the physical world. It is composed of components, each of which can be another system, creating a recursive structure that stops when a component is considered atomic. The system boundary is crucial as it defines the common frontier between the system and its environment, determining the inputs and outputs of the system. A fault can occur within this boundary, which is the cause of an error. This error propagates through the system's internal states and may eventually become visible as a failure in the system's external state. This failure, perceived at the service interface, can act as an external fault for another system, initiating a new cycle of fault, error, and failure. The line between error detection and error observation is subtle but significant. While error detection refers to identifying discrepancies internally within a system, error observation occurs when these discrepancies manifest externally, leading to an observable failure state.From a safety perspective, if humans are involved in the system, failures can be categorized as follows: If a failure does not affect the human, it is considered a near miss. If failure affects humans but causes no harm, it is classified as an incident. If failure results in harm, it is deemed an accident.5, 6Figure 1 provides a summary of these definitions. It illustrates the relationship between fault, error, and failure. The graph suggests that functional safety applies primarily to engineering or technical domains when no patient is involved. If a patient is engaged, functional safety falls under the broader category of patient safety.The human factor covers the entire system and timeline. Human errors happen in just a short amount of time or an instance.7Figure 2 illustrates an example of a wrong couch density override to demonstrate that the fault creates an error, which propagates to the patient. If the error is observed, it becomes a failure and, depending on the beam intensity is categorized as a near miss, incident, or accident (adverse event). Detailed scenarios of the case are summarized in Table 1.Daily QA trending serves as a helpful example. For instance, the beam output deviates from the baseline every day. These daily deviations are observed but are not termed errors or failures, as we define an error as any deviation beyond the 5% threshold. If the deviation exceeds 5%, it is an error until a physicist examines the monitor and identifies the error beyond 5%. At that moment, it is externally observed and termed as a failure. If physicists do not look at the monitor, the error remains in the system as an error and continues to propagate until it ultimately manifests as a failure in patient outcome observations.Within our community, we often refer to “preventing error”.3 As we discussed earlier, when we refer to an initial error, we mostly mean human error, where the root cause lies within the human factor, requiring the creation of conditions that minimize the likelihood of human error. What reports such as 394 refer to is what we call “preventing error propagation” here. Besides, “preventing error” carries a negative load and implies that errors are intrinsically evil. This is invalid in today's dynamic and adaptive systems. Errors are essential for a learning organization, as report 394 is indeed utilizing them for training and learning.8 Modern terms such as “managing errors” seem more suitable.The author was solely responsible for the conception, design, analysis, and writing of this manuscript.The author declares no conflicts of interest.This study did not involve human or animal subjects and, therefore did not require ethics approval.","PeriodicalId":14989,"journal":{"name":"Journal of Applied Clinical Medical Physics","volume":"26 6","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/acm2.70106","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Clinical Medical Physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/acm2.70106","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

The sequence and terminology are crucial: cyclically, through causation, a fault activates, triggering an error. Error propagates as errors. When an error is observed in the external environment, it becomes a failure. This failure can subsequently cause a fault in the system it serves, and the cycle continues.

Understanding systems and their boundaries is essential because the definitions of fault, error, and failure can vary based on whether they are internal or external to a system. A system is an entity that interacts with other entities, including hardware, software, humans, and the physical world. It is composed of components, each of which can be another system, creating a recursive structure that stops when a component is considered atomic. The system boundary is crucial as it defines the common frontier between the system and its environment, determining the inputs and outputs of the system. A fault can occur within this boundary, which is the cause of an error. This error propagates through the system's internal states and may eventually become visible as a failure in the system's external state. This failure, perceived at the service interface, can act as an external fault for another system, initiating a new cycle of fault, error, and failure. The line between error detection and error observation is subtle but significant. While error detection refers to identifying discrepancies internally within a system, error observation occurs when these discrepancies manifest externally, leading to an observable failure state.

From a safety perspective, if humans are involved in the system, failures can be categorized as follows: If a failure does not affect the human, it is considered a near miss. If failure affects humans but causes no harm, it is classified as an incident. If failure results in harm, it is deemed an accident.^{5, 6}

Figure 1 provides a summary of these definitions. It illustrates the relationship between fault, error, and failure. The graph suggests that functional safety applies primarily to engineering or technical domains when no patient is involved. If a patient is engaged, functional safety falls under the broader category of patient safety.

The human factor covers the entire system and timeline. Human errors happen in just a short amount of time or an instance.⁷

Figure 2 illustrates an example of a wrong couch density override to demonstrate that the fault creates an error, which propagates to the patient. If the error is observed, it becomes a failure and, depending on the beam intensity is categorized as a near miss, incident, or accident (adverse event). Detailed scenarios of the case are summarized in Table 1.

Daily QA trending serves as a helpful example. For instance, the beam output deviates from the baseline every day. These daily deviations are observed but are not termed errors or failures, as we define an error as any deviation beyond the 5% threshold. If the deviation exceeds 5%, it is an error until a physicist examines the monitor and identifies the error beyond 5%. At that moment, it is externally observed and termed as a failure. If physicists do not look at the monitor, the error remains in the system as an error and continues to propagate until it ultimately manifests as a failure in patient outcome observations.

Within our community, we often refer to “preventing error”.³ As we discussed earlier, when we refer to an initial error, we mostly mean human error, where the root cause lies within the human factor, requiring the creation of conditions that minimize the likelihood of human error. What reports such as 394 refer to is what we call “preventing error propagation” here. Besides, “preventing error” carries a negative load and implies that errors are intrinsically evil. This is invalid in today's dynamic and adaptive systems. Errors are essential for a learning organization, as report 394 is indeed utilizing them for training and learning.⁸ Modern terms such as “managing errors” seem more suitable.

The author was solely responsible for the conception, design, analysis, and writing of this manuscript.

The author declares no conflicts of interest.

This study did not involve human or animal subjects and, therefore did not require ethics approval.

Abstract Image

查看原文本刊更多论文

错误、错误和失败。

顺序和术语是至关重要的：周期性地，通过因果关系，一个错误被激活，触发一个错误。错误传播为错误。当在外部环境中观察到错误时，它就变成了失败。此故障随后会导致它所服务的系统出现故障，然后循环继续。理解系统及其边界是必要的，因为故障、错误和失败的定义可以根据它们是系统内部还是外部而变化。系统是与其他实体交互的实体，包括硬件、软件、人和物理世界。它由组件组成，每个组件都可以是另一个系统，创建一个递归结构，当组件被认为是原子时，该结构就停止了。系统边界是至关重要的，因为它定义了系统与其环境之间的公共边界，决定了系统的输入和输出。在这个边界内可能发生错误，这是错误的原因。这个错误通过系统的内部状态传播，最终可能会在系统的外部状态中出现故障。服务接口感知到的这种故障可以作为另一个系统的外部故障，从而启动一个新的故障、错误和故障循环。错误检测和错误观察之间的界限是微妙而重要的。错误检测指的是识别系统内部的差异，而错误观察发生在这些差异在外部显现时，导致可观察到的故障状态。从安全角度来看，如果系统中有人员参与，则故障可以分为以下几种：如果故障不影响人员，则认为是侥幸。如果故障影响到人员，但没有造成伤害，则归类为事故。如果失败导致伤害，则视为事故。图1提供了这些定义的摘要。它说明了故障、错误和失败之间的关系。该图表明，功能安全主要适用于工程或技术领域，而不涉及患者。如果患者参与其中，则功能安全属于更广泛的患者安全范畴。人的因素涵盖了整个系统和时间轴。人为错误只会在很短的时间内或一个实例中发生。图2举例说明了一个错误的躺椅密度覆盖的例子，以证明错误产生了一个错误，并传染给了患者。如果观察到错误，它就变成了失败，根据光束强度，它被归类为未遂事故、事故或事故（不良事件）。表1总结了该案例的详细场景。每日QA趋势就是一个很好的例子。例如，光束输出每天都会偏离基线。这些日常偏差是观察到的，但不称为误差或失败，因为我们将误差定义为超过5%阈值的任何偏差。如果偏差超过5%，它就是一个错误，直到物理学家检查监视器并识别出超过5%的误差。在那一刻，它被外界观察到，并被称为失败。如果物理学家不看监测器，这个错误就会作为一个错误留在系统中，并继续传播，直到它最终表现为患者结果观察的失败。在我们的社区中，我们经常提到“防止错误”正如我们前面所讨论的，当我们提到初始错误时，我们主要是指人为错误，其根本原因在于人为因素，需要创建最小化人为错误可能性的条件。像394这样的报告所指的是我们这里所说的“防止错误传播”。此外，“防止错误”带有负面含义，暗示错误本质上是邪恶的。这在今天的动态和适应性系统中是无效的。错误对于学习型组织来说是必不可少的，因为394号报告确实在利用错误进行培训和学习像“管理错误”这样的现代术语似乎更合适。作者全权负责本稿件的构思、设计、分析和写作。作者声明无利益冲突。这项研究不涉及人类或动物受试者，因此不需要伦理批准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Applied Clinical Medical Physics 医学-核医学

CiteScore

3.60

自引率

19.00%

发文量

331

审稿时长

3 months

期刊介绍： Journal of Applied Clinical Medical Physics is an international Open Access publication dedicated to clinical medical physics. JACMP welcomes original contributions dealing with all aspects of medical physics from scientists working in the clinical medical physics around the world. JACMP accepts only online submission. JACMP will publish: -Original Contributions: Peer-reviewed, investigations that represent new and significant contributions to the field. Recommended word count: up to 7500. -Review Articles: Reviews of major areas or sub-areas in the field of clinical medical physics. These articles may be of any length and are peer reviewed. -Technical Notes: These should be no longer than 3000 words, including key references. -Letters to the Editor: Comments on papers published in JACMP or on any other matters of interest to clinical medical physics. These should not be more than 1250 (including the literature) and their publication is only based on the decision of the editor, who occasionally asks experts on the merit of the contents. -Book Reviews: The editorial office solicits Book Reviews. -Announcements of Forthcoming Meetings: The Editor may provide notice of forthcoming meetings, course offerings, and other events relevant to clinical medical physics. -Parallel Opposed Editorial: We welcome topics relevant to clinical practice and medical physics profession. The contents can be controversial debate or opposed aspects of an issue. One author argues for the position and the other against. Each side of the debate contains an opening statement up to 800 words, followed by a rebuttal up to 500 words. Readers interested in participating in this series should contact the moderator with a proposed title and a short description of the topic