基于变长马尔可夫链的随机一致性检验

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2025-05-09 DOI:10.1016/j.is.2025.102561

Emilio Incerto , Andrea Vandin , Sima Sarv Ahrabi

{"title":"基于变长马尔可夫链的随机一致性检验","authors":"Emilio Incerto , Andrea Vandin , Sima Sarv Ahrabi","doi":"10.1016/j.is.2025.102561","DOIUrl":null,"url":null,"abstract":"<div><div>Conformance checking is central in process mining (PM). It studies deviations of logs from reference processes. Originally, the proposed approaches did not focus on stochastic aspects of the underlying process, and gave qualitative models as output. Recently, these have been extended in approaches for <em>stochastic conformance checking</em> (SCC), giving quantitative models as output. A different community, namely the <em>software performance engineering</em> (PE) one, interested in the synthesis of stochastic processes since decades, has developed independently techniques to synthesize Markov Chains (MC) that describe the stochastic process underlying program runs. However, these were never applied to SCC problems. We propose a novel approach to SCC based on PE results for the synthesis of stochastic processes. Thanks to a rich experimental evaluation, we show that it outperforms the state-of-the-art. In doing so, we further bridge PE and PM, fostering cross-fertilization. We use techniques for the synthesis of Variable-length MC (VLMC), higher-order MC able to compactly encode complex path dependencies in the control-flow. VLMCs are equipped with a notion of likelihood that a trace belongs to a model. We use it to perform SCC of a log against a model. We establish the degree of conformance by equipping VLMCs with uEMSC, a standard conformance measure in the SCC literature. We compare with 18 SCC techniques from the PM literature, using 11 benchmark datasets from the PM community. We outperform all approaches in 10 out of 11 datasets, i.e., we get uEMSC values closer to 1 for logs conforming to a model. Furthermore, we show that VLMC are efficient, as they handled all considered datasets in a few seconds.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102561"},"PeriodicalIF":3.4000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stochastic conformance checking based on variable-length Markov chains\",\"authors\":\"Emilio Incerto , Andrea Vandin , Sima Sarv Ahrabi\",\"doi\":\"10.1016/j.is.2025.102561\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Conformance checking is central in process mining (PM). It studies deviations of logs from reference processes. Originally, the proposed approaches did not focus on stochastic aspects of the underlying process, and gave qualitative models as output. Recently, these have been extended in approaches for <em>stochastic conformance checking</em> (SCC), giving quantitative models as output. A different community, namely the <em>software performance engineering</em> (PE) one, interested in the synthesis of stochastic processes since decades, has developed independently techniques to synthesize Markov Chains (MC) that describe the stochastic process underlying program runs. However, these were never applied to SCC problems. We propose a novel approach to SCC based on PE results for the synthesis of stochastic processes. Thanks to a rich experimental evaluation, we show that it outperforms the state-of-the-art. In doing so, we further bridge PE and PM, fostering cross-fertilization. We use techniques for the synthesis of Variable-length MC (VLMC), higher-order MC able to compactly encode complex path dependencies in the control-flow. VLMCs are equipped with a notion of likelihood that a trace belongs to a model. We use it to perform SCC of a log against a model. We establish the degree of conformance by equipping VLMCs with uEMSC, a standard conformance measure in the SCC literature. We compare with 18 SCC techniques from the PM literature, using 11 benchmark datasets from the PM community. We outperform all approaches in 10 out of 11 datasets, i.e., we get uEMSC values closer to 1 for logs conforming to a model. Furthermore, we show that VLMC are efficient, as they handled all considered datasets in a few seconds.</div></div>\",\"PeriodicalId\":50363,\"journal\":{\"name\":\"Information Systems\",\"volume\":\"133 \",\"pages\":\"Article 102561\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306437925000456\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000456","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

一致性检查是过程挖掘（PM）的核心。它研究了日志与参考过程的偏差。最初，提出的方法并没有关注潜在过程的随机方面，而是给出定性模型作为输出。最近，这些方法在随机一致性检查（SCC）方法中得到了扩展，给出了定量模型作为输出。一个不同的社区，即软件性能工程（PE），几十年来对随机过程的合成感兴趣，已经独立开发了合成马尔可夫链（MC）的技术，该技术描述了程序运行背后的随机过程。然而，这些从未应用于SCC问题。我们提出了一种基于PE结果的随机过程综合SCC的新方法。由于丰富的实验评估，我们表明它优于最先进的技术。通过这样做，我们进一步架起了体育和项目管理的桥梁，促进了交叉受精。我们使用了可变长度MC （VLMC）的合成技术，高阶MC能够在控制流中紧凑地编码复杂的路径依赖。vlmc配备了跟踪属于模型的可能性概念。我们使用它对模型执行日志的SCC。我们通过为vlmc配备uEMSC （SCC文献中的标准一致性测量）来建立一致性程度。我们比较了PM文献中的18种SCC技术，使用了PM社区的11个基准数据集。在11个数据集中的10个中，我们的性能优于所有方法，也就是说，对于符合模型的日志，我们的uEMSC值更接近1。此外，我们表明VLMC是高效的，因为它们在几秒钟内处理所有考虑的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Stochastic conformance checking based on variable-length Markov chains

Conformance checking is central in process mining (PM). It studies deviations of logs from reference processes. Originally, the proposed approaches did not focus on stochastic aspects of the underlying process, and gave qualitative models as output. Recently, these have been extended in approaches for stochastic conformance checking (SCC), giving quantitative models as output. A different community, namely the software performance engineering (PE) one, interested in the synthesis of stochastic processes since decades, has developed independently techniques to synthesize Markov Chains (MC) that describe the stochastic process underlying program runs. However, these were never applied to SCC problems. We propose a novel approach to SCC based on PE results for the synthesis of stochastic processes. Thanks to a rich experimental evaluation, we show that it outperforms the state-of-the-art. In doing so, we further bridge PE and PM, fostering cross-fertilization. We use techniques for the synthesis of Variable-length MC (VLMC), higher-order MC able to compactly encode complex path dependencies in the control-flow. VLMCs are equipped with a notion of likelihood that a trace belongs to a model. We use it to perform SCC of a log against a model. We establish the degree of conformance by equipping VLMCs with uEMSC, a standard conformance measure in the SCC literature. We compare with 18 SCC techniques from the PM literature, using 11 benchmark datasets from the PM community. We outperform all approaches in 10 out of 11 datasets, i.e., we get uEMSC values closer to 1 for logs conforming to a model. Furthermore, we show that VLMC are efficient, as they handled all considered datasets in a few seconds.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.