Efficient Online Computation of Business Process State From Trace Prefixes via N-Gram Indexing

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Services Computing Pub Date : 2025-03-03 DOI:10.1109/TSC.2025.3547235

David Chapela-Campa;Marlon Dumas

{"title":"Efficient Online Computation of Business Process State From Trace Prefixes via N-Gram Indexing","authors":"David Chapela-Campa;Marlon Dumas","doi":"10.1109/TSC.2025.3547235","DOIUrl":null,"url":null,"abstract":"This paper addresses the following problem: Given a process model and an event log containing trace prefixes of ongoing cases of a process, map each case to its corresponding state (i.e., marking) in the model. This state computation operation is a building block of other process mining operations, such as log animation and short-term simulation. An approach to this state computation problem is to perform a token-based replay of each trace prefix against the model. However, when a trace prefix does not strictly follow the behavior of the model, token replay may produce a state that is not reachable from the initial state of the process. An alternative approach is to first compute an alignment between the trace prefix of each ongoing case and the model, and then replay the aligned trace prefix. However, (prefix-)alignment is computationally expensive. This paper proposes a method that, given a trace prefix of an ongoing case, computes its state in constant time on the length of the trace using an index that represents states as <inline-formula><tex-math>$n$</tex-math></inline-formula>-grams. An empirical evaluation shows that the proposed approach has an accuracy comparable to that of the prefix-alignment approach, while achieving a throughput of hundreds of thousands of traces per second.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"18 2","pages":"770-783"},"PeriodicalIF":5.8000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Services Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10908906/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper addresses the following problem: Given a process model and an event log containing trace prefixes of ongoing cases of a process, map each case to its corresponding state (i.e., marking) in the model. This state computation operation is a building block of other process mining operations, such as log animation and short-term simulation. An approach to this state computation problem is to perform a token-based replay of each trace prefix against the model. However, when a trace prefix does not strictly follow the behavior of the model, token replay may produce a state that is not reachable from the initial state of the process. An alternative approach is to first compute an alignment between the trace prefix of each ongoing case and the model, and then replay the aligned trace prefix. However, (prefix-)alignment is computationally expensive. This paper proposes a method that, given a trace prefix of an ongoing case, computes its state in constant time on the length of the trace using an index that represents states as

$n$

-grams. An empirical evaluation shows that the proposed approach has an accuracy comparable to that of the prefix-alignment approach, while achieving a throughput of hundreds of thousands of traces per second.

查看原文本刊更多论文

基于N-Gram索引的业务流程状态在线高效计算

本文解决了以下问题：给定一个流程模型和一个包含流程正在进行的用例的跟踪前缀的事件日志，将每个用例映射到模型中相应的状态（即标记）。这种状态计算操作是其他过程挖掘操作（如日志动画和短期模拟）的构建块。解决此状态计算问题的一种方法是对模型执行每个跟踪前缀的基于令牌的重播。然而，当跟踪前缀没有严格遵循模型的行为时，令牌重放可能会产生一种无法从进程的初始状态到达的状态。另一种方法是首先计算每个正在进行的用例的跟踪前缀与模型之间的对齐，然后重播对齐的跟踪前缀。然而，（前缀-）对齐在计算上是昂贵的。本文提出了一种方法，给定正在进行的情况的跟踪前缀，使用将状态表示为$n$-grams的索引计算其在跟踪长度上的恒定时间内的状态。经验评估表明，该方法具有与前缀对齐方法相当的精度，同时实现每秒数十万道的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Services Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

11.50

自引率

6.20%

发文量

278

审稿时长

>12 weeks

期刊介绍： IEEE Transactions on Services Computing encompasses the computing and software aspects of the science and technology of services innovation research and development. It places emphasis on algorithmic, mathematical, statistical, and computational methods central to services computing. Topics covered include Service Oriented Architecture, Web Services, Business Process Integration, Solution Performance Management, and Services Operations and Management. The transactions address mathematical foundations, security, privacy, agreement, contract, discovery, negotiation, collaboration, and quality of service for web services. It also covers areas like composite web service creation, business and scientific applications, standards, utility models, business process modeling, integration, collaboration, and more in the realm of Services Computing.