{"title":"基于变长马尔可夫链的随机一致性检验","authors":"Emilio Incerto , Andrea Vandin , Sima Sarv Ahrabi","doi":"10.1016/j.is.2025.102561","DOIUrl":null,"url":null,"abstract":"<div><div>Conformance checking is central in process mining (PM). It studies deviations of logs from reference processes. Originally, the proposed approaches did not focus on stochastic aspects of the underlying process, and gave qualitative models as output. Recently, these have been extended in approaches for <em>stochastic conformance checking</em> (SCC), giving quantitative models as output. A different community, namely the <em>software performance engineering</em> (PE) one, interested in the synthesis of stochastic processes since decades, has developed independently techniques to synthesize Markov Chains (MC) that describe the stochastic process underlying program runs. However, these were never applied to SCC problems. We propose a novel approach to SCC based on PE results for the synthesis of stochastic processes. Thanks to a rich experimental evaluation, we show that it outperforms the state-of-the-art. In doing so, we further bridge PE and PM, fostering cross-fertilization. We use techniques for the synthesis of Variable-length MC (VLMC), higher-order MC able to compactly encode complex path dependencies in the control-flow. VLMCs are equipped with a notion of likelihood that a trace belongs to a model. We use it to perform SCC of a log against a model. We establish the degree of conformance by equipping VLMCs with uEMSC, a standard conformance measure in the SCC literature. We compare with 18 SCC techniques from the PM literature, using 11 benchmark datasets from the PM community. We outperform all approaches in 10 out of 11 datasets, i.e., we get uEMSC values closer to 1 for logs conforming to a model. Furthermore, we show that VLMC are efficient, as they handled all considered datasets in a few seconds.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102561"},"PeriodicalIF":3.0000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stochastic conformance checking based on variable-length Markov chains\",\"authors\":\"Emilio Incerto , Andrea Vandin , Sima Sarv Ahrabi\",\"doi\":\"10.1016/j.is.2025.102561\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Conformance checking is central in process mining (PM). It studies deviations of logs from reference processes. Originally, the proposed approaches did not focus on stochastic aspects of the underlying process, and gave qualitative models as output. Recently, these have been extended in approaches for <em>stochastic conformance checking</em> (SCC), giving quantitative models as output. A different community, namely the <em>software performance engineering</em> (PE) one, interested in the synthesis of stochastic processes since decades, has developed independently techniques to synthesize Markov Chains (MC) that describe the stochastic process underlying program runs. However, these were never applied to SCC problems. We propose a novel approach to SCC based on PE results for the synthesis of stochastic processes. Thanks to a rich experimental evaluation, we show that it outperforms the state-of-the-art. In doing so, we further bridge PE and PM, fostering cross-fertilization. We use techniques for the synthesis of Variable-length MC (VLMC), higher-order MC able to compactly encode complex path dependencies in the control-flow. VLMCs are equipped with a notion of likelihood that a trace belongs to a model. We use it to perform SCC of a log against a model. We establish the degree of conformance by equipping VLMCs with uEMSC, a standard conformance measure in the SCC literature. We compare with 18 SCC techniques from the PM literature, using 11 benchmark datasets from the PM community. We outperform all approaches in 10 out of 11 datasets, i.e., we get uEMSC values closer to 1 for logs conforming to a model. Furthermore, we show that VLMC are efficient, as they handled all considered datasets in a few seconds.</div></div>\",\"PeriodicalId\":50363,\"journal\":{\"name\":\"Information Systems\",\"volume\":\"133 \",\"pages\":\"Article 102561\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306437925000456\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000456","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Stochastic conformance checking based on variable-length Markov chains
Conformance checking is central in process mining (PM). It studies deviations of logs from reference processes. Originally, the proposed approaches did not focus on stochastic aspects of the underlying process, and gave qualitative models as output. Recently, these have been extended in approaches for stochastic conformance checking (SCC), giving quantitative models as output. A different community, namely the software performance engineering (PE) one, interested in the synthesis of stochastic processes since decades, has developed independently techniques to synthesize Markov Chains (MC) that describe the stochastic process underlying program runs. However, these were never applied to SCC problems. We propose a novel approach to SCC based on PE results for the synthesis of stochastic processes. Thanks to a rich experimental evaluation, we show that it outperforms the state-of-the-art. In doing so, we further bridge PE and PM, fostering cross-fertilization. We use techniques for the synthesis of Variable-length MC (VLMC), higher-order MC able to compactly encode complex path dependencies in the control-flow. VLMCs are equipped with a notion of likelihood that a trace belongs to a model. We use it to perform SCC of a log against a model. We establish the degree of conformance by equipping VLMCs with uEMSC, a standard conformance measure in the SCC literature. We compare with 18 SCC techniques from the PM literature, using 11 benchmark datasets from the PM community. We outperform all approaches in 10 out of 11 datasets, i.e., we get uEMSC values closer to 1 for logs conforming to a model. Furthermore, we show that VLMC are efficient, as they handled all considered datasets in a few seconds.
期刊介绍:
Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems.
Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.