SinkFlow：快速、可追溯地定位多维异常事件的根本原因

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2024-11-22 DOI:10.1016/j.engappai.2024.109582

Zhichao Hu , Likun Liu , Lina Ma , Xiangzhan Yu

{"title":"SinkFlow：快速、可追溯地定位多维异常事件的根本原因","authors":"Zhichao Hu , Likun Liu , Lina Ma , Xiangzhan Yu","doi":"10.1016/j.engappai.2024.109582","DOIUrl":null,"url":null,"abstract":"<div><div>With the development of various artificial intelligence (AI)–based applications, detecting anomalies and analyzing the root causes from massive data are critical to increasing the usability of AI. Fast, accurate root-cause analysis (RCA) that finds the main reason for an anomaly, as well as reasonable explanations, helps in solving problems effectively. Thus, RCA plays an important role in troubleshooting and fault diagnosis, making its application in data analysis crucial. Previous root-cause-localization approaches for multidimensional anomaly events encompass various techniques to reduce search space and have improved the localization performance. However, they do not effectively balance the requirements in terms of performance, compatibility, and interpretability. To solve these problems, we propose a new root-cause-localization method called <em>SinkFlow</em>. It provides a unified framework event-aggregation Graph (EAG) to describe the constraints of event aggregation and relations between events, so it can be easily generalized to various domains. <em>SinkFlow</em> introduces an applicable measure evaluation method for both fundamental and derived measures to quantify the impact of events. Also, it utilizes an optimal search strategy to reduce the search space based on the anomaly behavioral consistency and deviation significance. Our experimental results on semisynthetic datasets show that <em>SinkFlow</em> achieved better performance than other baselines and ran much faster, achieving a 1.88% increase of the F1-score and only 25% of the time cost of the second best localization method. In addition, <em>SinkFlow</em> offered clear, visible explanations of the localization results to answer the questions of why they are root causes and how the anomaly is formed.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109582"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SinkFlow: Fast and traceable root-cause localization for multidimensional anomaly events\",\"authors\":\"Zhichao Hu , Likun Liu , Lina Ma , Xiangzhan Yu\",\"doi\":\"10.1016/j.engappai.2024.109582\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the development of various artificial intelligence (AI)–based applications, detecting anomalies and analyzing the root causes from massive data are critical to increasing the usability of AI. Fast, accurate root-cause analysis (RCA) that finds the main reason for an anomaly, as well as reasonable explanations, helps in solving problems effectively. Thus, RCA plays an important role in troubleshooting and fault diagnosis, making its application in data analysis crucial. Previous root-cause-localization approaches for multidimensional anomaly events encompass various techniques to reduce search space and have improved the localization performance. However, they do not effectively balance the requirements in terms of performance, compatibility, and interpretability. To solve these problems, we propose a new root-cause-localization method called <em>SinkFlow</em>. It provides a unified framework event-aggregation Graph (EAG) to describe the constraints of event aggregation and relations between events, so it can be easily generalized to various domains. <em>SinkFlow</em> introduces an applicable measure evaluation method for both fundamental and derived measures to quantify the impact of events. Also, it utilizes an optimal search strategy to reduce the search space based on the anomaly behavioral consistency and deviation significance. Our experimental results on semisynthetic datasets show that <em>SinkFlow</em> achieved better performance than other baselines and ran much faster, achieving a 1.88% increase of the F1-score and only 25% of the time cost of the second best localization method. In addition, <em>SinkFlow</em> offered clear, visible explanations of the localization results to answer the questions of why they are root causes and how the anomaly is formed.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"139 \",\"pages\":\"Article 109582\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197624017408\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197624017408","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

随着各种基于人工智能（AI）的应用的发展，从海量数据中检测异常并分析根本原因对于提高人工智能的可用性至关重要。快速、准确的根本原因分析（RCA）可以找到异常的主要原因以及合理的解释，有助于有效地解决问题。因此，RCA 在故障排除和故障诊断中发挥着重要作用，使其在数据分析中的应用变得至关重要。以往针对多维异常事件的根源定位方法包含各种缩小搜索空间的技术，并提高了定位性能。然而，这些方法并不能有效平衡性能、兼容性和可解释性等方面的要求。为了解决这些问题，我们提出了一种名为 SinkFlow 的新型根源定位方法。它提供了一个统一的框架事件聚合图（EAG）来描述事件聚合的约束条件和事件之间的关系，因此可以很容易地推广到各种领域。SinkFlow 引入了一种适用于基本度量和衍生度量的度量评估方法，以量化事件的影响。此外，它还利用优化搜索策略，根据异常行为的一致性和偏差的重要性来缩小搜索空间。我们在半合成数据集上的实验结果表明，SinkFlow 比其他基线方法取得了更好的性能，运行速度也更快，F1 分数提高了 1.88%，时间成本仅为第二好的定位方法的 25%。此外，SinkFlow 对定位结果提供了清晰可见的解释，回答了为什么它们是根本原因以及异常是如何形成的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SinkFlow: Fast and traceable root-cause localization for multidimensional anomaly events

With the development of various artificial intelligence (AI)–based applications, detecting anomalies and analyzing the root causes from massive data are critical to increasing the usability of AI. Fast, accurate root-cause analysis (RCA) that finds the main reason for an anomaly, as well as reasonable explanations, helps in solving problems effectively. Thus, RCA plays an important role in troubleshooting and fault diagnosis, making its application in data analysis crucial. Previous root-cause-localization approaches for multidimensional anomaly events encompass various techniques to reduce search space and have improved the localization performance. However, they do not effectively balance the requirements in terms of performance, compatibility, and interpretability. To solve these problems, we propose a new root-cause-localization method called SinkFlow. It provides a unified framework event-aggregation Graph (EAG) to describe the constraints of event aggregation and relations between events, so it can be easily generalized to various domains. SinkFlow introduces an applicable measure evaluation method for both fundamental and derived measures to quantify the impact of events. Also, it utilizes an optimal search strategy to reduce the search space based on the anomaly behavioral consistency and deviation significance. Our experimental results on semisynthetic datasets show that SinkFlow achieved better performance than other baselines and ran much faster, achieving a 1.88% increase of the F1-score and only 25% of the time cost of the second best localization method. In addition, SinkFlow offered clear, visible explanations of the localization results to answer the questions of why they are root causes and how the anomaly is formed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.