{"title":"Review on Information Fusion‐Based Data Mining for Improving Complex Anomaly Detection","authors":"Sorin‐Claudiu Moldovan, Laszlo Barna Iantovics","doi":"10.1002/widm.70017","DOIUrl":null,"url":null,"abstract":"Anomaly predicated upon multiple distributed hybrid sensors frequently uses hybrid approaches, integrating techniques derived from statistical analysis, probability, data mining, machine learning, deep learning, and signal denoising. Many of these methods are based on the analysis of irregularities, data continuity, correlation, and data consistency, aiming to discern anomalous patterns from normal behavior. By leveraging these techniques information fusion aims to enhance situational awareness, detect potential threats or abnormalities, and improve decision‐making processes in complex environments. It addresses uncertainties by integrating data from diverse sources, thereby enhancing performance, and reducing dependency on individual sensors. This study examines applications based on single and multiple sensor data, revealing common strategies, identifying strengths and weaknesses, and potential solutions for detecting and diagnosing anomalies by analyzing low, large, and complex data derived from the context of homogeneous or heterogeneous systems. Information fusion techniques are evaluated for their performance on various levels of algorithm complexity. This in‐depth bibliographic study involved searching top indexing databases such as Web of Science and Scopus. The inclusion criteria were articles published between 2012 and 2024. The search capitalized on specific keywords as follows: “sensor malfunction,” “sensor anomaly,” “sensor failure,” “sensor fusion,” and “anomaly data mining.” Publications that did not strictly focus on analytical processing for anomaly detection, diagnosis, and prognosis in sensor data were excluded. In conclusion, the practice of information fusion promotes transparency by elucidating the process of combining information, thereby enabling the inclusion of multitude of perspectives, and aligning with established best practices in the field. Data deviation remains the primary criterion for detecting anomalies using mostly deep learning and extensively hybrid techniques. Nevertheless, state‐of‐the‐art algorithms based on neural networks still require further contextual interpretation and analysis. Functional safety and safety of intended functionality breaching can lead to decision‐making errors, physical harm, and erosion of trust in autonomous systems. This is due to the lack of interpretability in AI approaches, making it challenging to predict and understand the system's behavior under various conditions.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"WIREs Data Mining and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/widm.70017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Anomaly predicated upon multiple distributed hybrid sensors frequently uses hybrid approaches, integrating techniques derived from statistical analysis, probability, data mining, machine learning, deep learning, and signal denoising. Many of these methods are based on the analysis of irregularities, data continuity, correlation, and data consistency, aiming to discern anomalous patterns from normal behavior. By leveraging these techniques information fusion aims to enhance situational awareness, detect potential threats or abnormalities, and improve decision‐making processes in complex environments. It addresses uncertainties by integrating data from diverse sources, thereby enhancing performance, and reducing dependency on individual sensors. This study examines applications based on single and multiple sensor data, revealing common strategies, identifying strengths and weaknesses, and potential solutions for detecting and diagnosing anomalies by analyzing low, large, and complex data derived from the context of homogeneous or heterogeneous systems. Information fusion techniques are evaluated for their performance on various levels of algorithm complexity. This in‐depth bibliographic study involved searching top indexing databases such as Web of Science and Scopus. The inclusion criteria were articles published between 2012 and 2024. The search capitalized on specific keywords as follows: “sensor malfunction,” “sensor anomaly,” “sensor failure,” “sensor fusion,” and “anomaly data mining.” Publications that did not strictly focus on analytical processing for anomaly detection, diagnosis, and prognosis in sensor data were excluded. In conclusion, the practice of information fusion promotes transparency by elucidating the process of combining information, thereby enabling the inclusion of multitude of perspectives, and aligning with established best practices in the field. Data deviation remains the primary criterion for detecting anomalies using mostly deep learning and extensively hybrid techniques. Nevertheless, state‐of‐the‐art algorithms based on neural networks still require further contextual interpretation and analysis. Functional safety and safety of intended functionality breaching can lead to decision‐making errors, physical harm, and erosion of trust in autonomous systems. This is due to the lack of interpretability in AI approaches, making it challenging to predict and understand the system's behavior under various conditions.
基于多个分布式混合传感器的异常预测经常使用混合方法,集成了来自统计分析、概率、数据挖掘、机器学习、深度学习和信号去噪的技术。其中许多方法基于对不规则性、数据连续性、相关性和数据一致性的分析,旨在从正常行为中识别异常模式。通过利用这些技术,信息融合旨在增强态势感知,检测潜在威胁或异常,并改善复杂环境中的决策过程。它通过集成来自不同来源的数据来解决不确定性,从而提高性能,减少对单个传感器的依赖。本研究考察了基于单个和多个传感器数据的应用,揭示了通用策略,确定了优势和劣势,以及通过分析来自同质或异构系统的低、大、复杂数据来检测和诊断异常的潜在解决方案。信息融合技术在不同算法复杂度水平上的性能被评估。这项深入的书目研究包括搜索顶级索引数据库,如Web of Science和Scopus。纳入标准是2012年至2024年间发表的文章。搜索利用了以下特定关键词:“传感器故障”、“传感器异常”、“传感器故障”、“传感器融合”和“异常数据挖掘”。未严格关注传感器数据异常检测、诊断和预后分析处理的出版物被排除在外。总之,信息融合的实践通过阐明组合信息的过程来提高透明度,从而允许包含多种观点,并与该领域已建立的最佳实践保持一致。数据偏差仍然是使用深度学习和广泛混合技术检测异常的主要标准。然而,基于神经网络的最先进的算法仍然需要进一步的上下文解释和分析。功能安全和预期功能破坏的安全性可能导致决策错误、人身伤害和对自主系统信任的侵蚀。这是由于人工智能方法缺乏可解释性,使得预测和理解系统在各种条件下的行为具有挑战性。