Addressing the label dilemma: A self-semi-supervised step-wise complementary label boosting strategy for industrial anomaly detection

IF 11 1区工程技术 Q1 ENGINEERING, INDUSTRIAL

Reliability Engineering & System Safety Pub Date : 2025-06-28 DOI:10.1016/j.ress.2025.111369

Jiayang Yang, Chunhui Zhao

{"title":"Addressing the label dilemma: A self-semi-supervised step-wise complementary label boosting strategy for industrial anomaly detection","authors":"Jiayang Yang, Chunhui Zhao","doi":"10.1016/j.ress.2025.111369","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, Artificial Intelligence (AI) technology has been extensively employed in data-driven industrial anomaly detection. However, due to the difficulty of reliably acquiring the operating status of industrial processes, most process data may be collected without rigorous examination, resulting in uncertainty regarding their exact statuses and limiting their safe utilization for AI-powered anomaly detection modeling. Additionally, samples with definite annotations could still be subject to misjudgment of statuses by manual error, thereby exposing anomaly detection modeling to a significant risk of misleading. In this work, we accomplish anomaly detection as a binary classification task and recognize the aforementioned challenges as a modeling dilemma involving sample labels (annotations indicating their operating statuses, i.e., normal/abnormal), where the available labels are insufficient and unreliable simultaneously. Thereupon, a self-semi-supervised step-wise complementary label boosting (<span><math><msup><mrow><mi>S</mi></mrow><mrow><mn>4</mn></mrow></msup></math></span>CLB) strategy is proposed to address that dilemma. The <span><math><msup><mrow><mi>S</mi></mrow><mrow><mn>4</mn></mrow></msup></math></span>CLB strategy mainly consists of two stages, in the first stage, the self-supervised contrastive autoencoding Gaussian mixture model (CAGMM) is developed to provide representations of all the process samples for the subsequent anomaly detection by describing their data distribution information with low-dimensional features. In the second stage, a semi-supervised label boosting strategy is designed in a step-wise manner. Specifically, the noisy label filtering and adaptive label enrichment are conducted alternately to boost the sufficiency and reliability of available labels regressively. Meanwhile, the robust dual complementary classifier (RDCC) model comprising two peer classifiers with robustness and different views is developed to achieve the prompt feedback for label boosting, thus the reliability of label adjustment is further guaranteed. Finally, the anomaly detection results are obtained by the RDCC model. The effectiveness of the proposed method is verified by a real industrial process.</div></div>","PeriodicalId":54500,"journal":{"name":"Reliability Engineering & System Safety","volume":"264 ","pages":"Article 111369"},"PeriodicalIF":11.0000,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Reliability Engineering & System Safety","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0951832025005708","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, Artificial Intelligence (AI) technology has been extensively employed in data-driven industrial anomaly detection. However, due to the difficulty of reliably acquiring the operating status of industrial processes, most process data may be collected without rigorous examination, resulting in uncertainty regarding their exact statuses and limiting their safe utilization for AI-powered anomaly detection modeling. Additionally, samples with definite annotations could still be subject to misjudgment of statuses by manual error, thereby exposing anomaly detection modeling to a significant risk of misleading. In this work, we accomplish anomaly detection as a binary classification task and recognize the aforementioned challenges as a modeling dilemma involving sample labels (annotations indicating their operating statuses, i.e., normal/abnormal), where the available labels are insufficient and unreliable simultaneously. Thereupon, a self-semi-supervised step-wise complementary label boosting (

S^{4}

CLB) strategy is proposed to address that dilemma. The

S^{4}

CLB strategy mainly consists of two stages, in the first stage, the self-supervised contrastive autoencoding Gaussian mixture model (CAGMM) is developed to provide representations of all the process samples for the subsequent anomaly detection by describing their data distribution information with low-dimensional features. In the second stage, a semi-supervised label boosting strategy is designed in a step-wise manner. Specifically, the noisy label filtering and adaptive label enrichment are conducted alternately to boost the sufficiency and reliability of available labels regressively. Meanwhile, the robust dual complementary classifier (RDCC) model comprising two peer classifiers with robustness and different views is developed to achieve the prompt feedback for label boosting, thus the reliability of label adjustment is further guaranteed. Finally, the anomaly detection results are obtained by the RDCC model. The effectiveness of the proposed method is verified by a real industrial process.

查看原文本刊更多论文

解决标签困境：一种用于工业异常检测的自半监督分步互补标签提升策略

近年来，人工智能（AI）技术在数据驱动的工业异常检测中得到了广泛应用。然而，由于难以可靠地获取工业过程的运行状态，大多数过程数据可能在没有经过严格检查的情况下收集，从而导致其确切状态的不确定性，并限制了它们在人工智能异常检测建模中的安全利用。此外，带有明确注释的样本仍然可能受到人工错误对状态的错误判断的影响，从而使异常检测建模面临误导的重大风险。在这项工作中，我们将异常检测作为一项二元分类任务来完成，并将上述挑战视为涉及样本标签（表明其运行状态的注释，即正常/异常）的建模困境，其中可用标签同时不足且不可靠。因此，提出了一种自半监督逐步互补标签提升（S4CLB）策略来解决这一困境。S4CLB策略主要包括两个阶段，第一阶段，开发自监督对比自编码高斯混合模型（CAGMM），通过低维特征描述过程样本的数据分布信息，为后续异常检测提供过程样本的表示。第二阶段，逐步设计半监督标签提升策略。具体而言，交替进行噪声标签滤波和自适应标签富集，以回归提高可用标签的充分性和可靠性。同时，建立了鲁棒双互补分类器（RDCC）模型，该模型由两个具有鲁棒性和不同观点的对等分类器组成，实现了标签提升的及时反馈，进一步保证了标签调整的可靠性。最后，利用RDCC模型得到异常检测结果。通过实际工业过程验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Reliability Engineering & System Safety 管理科学-工程：工业

CiteScore

15.20

自引率

39.50%

发文量

621

审稿时长

67 days

期刊介绍： Elsevier publishes Reliability Engineering & System Safety in association with the European Safety and Reliability Association and the Safety Engineering and Risk Analysis Division. The international journal is devoted to developing and applying methods to enhance the safety and reliability of complex technological systems, like nuclear power plants, chemical plants, hazardous waste facilities, space systems, offshore and maritime systems, transportation systems, constructed infrastructure, and manufacturing plants. The journal normally publishes only articles that involve the analysis of substantive problems related to the reliability of complex systems or present techniques and/or theoretical results that have a discernable relationship to the solution of such problems. An important aim is to balance academic material and practical applications.