Zekun Zhang , Jian Wang , Bing Li , Yu Liu , Hongyue Wu , Patrick C.K. Hung
{"title":"微服务系统中基于多变量Hawkes过程的根本原因分析","authors":"Zekun Zhang , Jian Wang , Bing Li , Yu Liu , Hongyue Wu , Patrick C.K. Hung","doi":"10.1016/j.infsof.2025.107938","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Recent years have witnessed a prevailing trend of developing applications using microservice architectures. Microservice systems typically involve multiple containers that share resources on a single physical host, thereby complicating the interdependencies among microservices. This complexity significantly hinders the identification of root causes of performance issues.</div></div><div><h3>Objective:</h3><div>Performance issues can manifest in various forms. Existing approaches often overlook other potential failure indicators, such as process anomalies that are discernible in audit logs. This paper aims to refine the granularity of root cause analysis to the process level.</div></div><div><h3>Methods:</h3><div>This paper proposes a novel approach called MHP-RCA (Multivariate Hawkes Process-based Root Cause Analysis), which integrates diverse data types, including metrics and audit logs, to localize the root cause in microservice systems. MHP-RCA generates anomalous events from the observable data, then leverages the multivariate Hawkes process to construct causal graphs for effective root cause identification.</div></div><div><h3>Results:</h3><div>Extensive experiments, involving the injection of various anomalies into four widely used open-source benchmarks, demonstrate that MHP-RCA surpasses multiple baseline methods in most cases. Compared to the best-performing baseline approach, MHP-RCA achieves an average overall improvement of 2.5% in AC@1 and 3.7% in AC@5.</div></div><div><h3>Conclusion:</h3><div>The proposed method MHP-RCA, which considers audit logs and metrics, can localize the root cause of microservice anomalies at the process level.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107938"},"PeriodicalIF":4.3000,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MHP-RCA: Multivariate Hawkes Process-based Root Cause Analysis in microservice systems\",\"authors\":\"Zekun Zhang , Jian Wang , Bing Li , Yu Liu , Hongyue Wu , Patrick C.K. Hung\",\"doi\":\"10.1016/j.infsof.2025.107938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><div>Recent years have witnessed a prevailing trend of developing applications using microservice architectures. Microservice systems typically involve multiple containers that share resources on a single physical host, thereby complicating the interdependencies among microservices. This complexity significantly hinders the identification of root causes of performance issues.</div></div><div><h3>Objective:</h3><div>Performance issues can manifest in various forms. Existing approaches often overlook other potential failure indicators, such as process anomalies that are discernible in audit logs. This paper aims to refine the granularity of root cause analysis to the process level.</div></div><div><h3>Methods:</h3><div>This paper proposes a novel approach called MHP-RCA (Multivariate Hawkes Process-based Root Cause Analysis), which integrates diverse data types, including metrics and audit logs, to localize the root cause in microservice systems. MHP-RCA generates anomalous events from the observable data, then leverages the multivariate Hawkes process to construct causal graphs for effective root cause identification.</div></div><div><h3>Results:</h3><div>Extensive experiments, involving the injection of various anomalies into four widely used open-source benchmarks, demonstrate that MHP-RCA surpasses multiple baseline methods in most cases. Compared to the best-performing baseline approach, MHP-RCA achieves an average overall improvement of 2.5% in AC@1 and 3.7% in AC@5.</div></div><div><h3>Conclusion:</h3><div>The proposed method MHP-RCA, which considers audit logs and metrics, can localize the root cause of microservice anomalies at the process level.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"189 \",\"pages\":\"Article 107938\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925002770\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002770","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
MHP-RCA: Multivariate Hawkes Process-based Root Cause Analysis in microservice systems
Context:
Recent years have witnessed a prevailing trend of developing applications using microservice architectures. Microservice systems typically involve multiple containers that share resources on a single physical host, thereby complicating the interdependencies among microservices. This complexity significantly hinders the identification of root causes of performance issues.
Objective:
Performance issues can manifest in various forms. Existing approaches often overlook other potential failure indicators, such as process anomalies that are discernible in audit logs. This paper aims to refine the granularity of root cause analysis to the process level.
Methods:
This paper proposes a novel approach called MHP-RCA (Multivariate Hawkes Process-based Root Cause Analysis), which integrates diverse data types, including metrics and audit logs, to localize the root cause in microservice systems. MHP-RCA generates anomalous events from the observable data, then leverages the multivariate Hawkes process to construct causal graphs for effective root cause identification.
Results:
Extensive experiments, involving the injection of various anomalies into four widely used open-source benchmarks, demonstrate that MHP-RCA surpasses multiple baseline methods in most cases. Compared to the best-performing baseline approach, MHP-RCA achieves an average overall improvement of 2.5% in AC@1 and 3.7% in AC@5.
Conclusion:
The proposed method MHP-RCA, which considers audit logs and metrics, can localize the root cause of microservice anomalies at the process level.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.