{"title":"Addressing the label dilemma: A self-semi-supervised step-wise complementary label boosting strategy for industrial anomaly detection","authors":"Jiayang Yang, Chunhui Zhao","doi":"10.1016/j.ress.2025.111369","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, Artificial Intelligence (AI) technology has been extensively employed in data-driven industrial anomaly detection. However, due to the difficulty of reliably acquiring the operating status of industrial processes, most process data may be collected without rigorous examination, resulting in uncertainty regarding their exact statuses and limiting their safe utilization for AI-powered anomaly detection modeling. Additionally, samples with definite annotations could still be subject to misjudgment of statuses by manual error, thereby exposing anomaly detection modeling to a significant risk of misleading. In this work, we accomplish anomaly detection as a binary classification task and recognize the aforementioned challenges as a modeling dilemma involving sample labels (annotations indicating their operating statuses, i.e., normal/abnormal), where the available labels are insufficient and unreliable simultaneously. Thereupon, a self-semi-supervised step-wise complementary label boosting (<span><math><msup><mrow><mi>S</mi></mrow><mrow><mn>4</mn></mrow></msup></math></span>CLB) strategy is proposed to address that dilemma. The <span><math><msup><mrow><mi>S</mi></mrow><mrow><mn>4</mn></mrow></msup></math></span>CLB strategy mainly consists of two stages, in the first stage, the self-supervised contrastive autoencoding Gaussian mixture model (CAGMM) is developed to provide representations of all the process samples for the subsequent anomaly detection by describing their data distribution information with low-dimensional features. In the second stage, a semi-supervised label boosting strategy is designed in a step-wise manner. Specifically, the noisy label filtering and adaptive label enrichment are conducted alternately to boost the sufficiency and reliability of available labels regressively. Meanwhile, the robust dual complementary classifier (RDCC) model comprising two peer classifiers with robustness and different views is developed to achieve the prompt feedback for label boosting, thus the reliability of label adjustment is further guaranteed. Finally, the anomaly detection results are obtained by the RDCC model. The effectiveness of the proposed method is verified by a real industrial process.</div></div>","PeriodicalId":54500,"journal":{"name":"Reliability Engineering & System Safety","volume":"264 ","pages":"Article 111369"},"PeriodicalIF":11.0000,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Reliability Engineering & System Safety","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0951832025005708","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, Artificial Intelligence (AI) technology has been extensively employed in data-driven industrial anomaly detection. However, due to the difficulty of reliably acquiring the operating status of industrial processes, most process data may be collected without rigorous examination, resulting in uncertainty regarding their exact statuses and limiting their safe utilization for AI-powered anomaly detection modeling. Additionally, samples with definite annotations could still be subject to misjudgment of statuses by manual error, thereby exposing anomaly detection modeling to a significant risk of misleading. In this work, we accomplish anomaly detection as a binary classification task and recognize the aforementioned challenges as a modeling dilemma involving sample labels (annotations indicating their operating statuses, i.e., normal/abnormal), where the available labels are insufficient and unreliable simultaneously. Thereupon, a self-semi-supervised step-wise complementary label boosting (CLB) strategy is proposed to address that dilemma. The CLB strategy mainly consists of two stages, in the first stage, the self-supervised contrastive autoencoding Gaussian mixture model (CAGMM) is developed to provide representations of all the process samples for the subsequent anomaly detection by describing their data distribution information with low-dimensional features. In the second stage, a semi-supervised label boosting strategy is designed in a step-wise manner. Specifically, the noisy label filtering and adaptive label enrichment are conducted alternately to boost the sufficiency and reliability of available labels regressively. Meanwhile, the robust dual complementary classifier (RDCC) model comprising two peer classifiers with robustness and different views is developed to achieve the prompt feedback for label boosting, thus the reliability of label adjustment is further guaranteed. Finally, the anomaly detection results are obtained by the RDCC model. The effectiveness of the proposed method is verified by a real industrial process.
期刊介绍:
Elsevier publishes Reliability Engineering & System Safety in association with the European Safety and Reliability Association and the Safety Engineering and Risk Analysis Division. The international journal is devoted to developing and applying methods to enhance the safety and reliability of complex technological systems, like nuclear power plants, chemical plants, hazardous waste facilities, space systems, offshore and maritime systems, transportation systems, constructed infrastructure, and manufacturing plants. The journal normally publishes only articles that involve the analysis of substantive problems related to the reliability of complex systems or present techniques and/or theoretical results that have a discernable relationship to the solution of such problems. An important aim is to balance academic material and practical applications.