Event-Case Correlation for Process Mining using Probabilistic Optimization

Mob. Inf. Syst. Pub Date : 2022-06-20 DOI:10.48550/arXiv.2206.10009

Dina Bayomie, Claudio Di Ciccio, J. Mendling

{"title":"Event-Case Correlation for Process Mining using Probabilistic Optimization","authors":"Dina Bayomie, Claudio Di Ciccio, J. Mendling","doi":"10.48550/arXiv.2206.10009","DOIUrl":null,"url":null,"abstract":"Process mining supports the analysis of the actual behavior and performance of business processes using event logs. % such as, e.g., sales transactions recorded by an ERP system. An essential requirement is that every event in the log must be associated with a unique case identifier (e.g., the order ID of an order-to-cash process). In reality, however, this case identifier may not always be present, especially when logs are acquired from different systems or extracted from non-process-aware information systems. In such settings, the event log needs to be pre-processed by grouping events into cases -- an operation known as event correlation. Existing techniques for correlating events have worked with assumptions to make the problem tractable: some assume the generative processes to be acyclic, while others require heuristic information or user input. Moreover, %these techniques' primary assumption is that they abstract the log to activities and timestamps, and miss the opportunity to use data attributes. % In this paper, we lift these assumptions and propose a new technique called EC-SA-Data based on probabilistic optimization. The technique takes as inputs a sequence of timestamped events (the log without case IDs), a process model describing the underlying business process, and constraints over the event attributes. Our approach returns an event log in which every event is associated with a case identifier. The technique allows users to incorporate rules on process knowledge and data constraints flexibly. The approach minimizes the misalignment between the generated log and the input process model, maximizes the support of the given data constraints over the correlated log, and the variance between activity durations across cases. Our experiments with various real-life datasets show the advantages of our approach over the state of the art.","PeriodicalId":18790,"journal":{"name":"Mob. Inf. Syst.","volume":"46 10","pages":"102167"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mob. Inf. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.10009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Process mining supports the analysis of the actual behavior and performance of business processes using event logs. % such as, e.g., sales transactions recorded by an ERP system. An essential requirement is that every event in the log must be associated with a unique case identifier (e.g., the order ID of an order-to-cash process). In reality, however, this case identifier may not always be present, especially when logs are acquired from different systems or extracted from non-process-aware information systems. In such settings, the event log needs to be pre-processed by grouping events into cases -- an operation known as event correlation. Existing techniques for correlating events have worked with assumptions to make the problem tractable: some assume the generative processes to be acyclic, while others require heuristic information or user input. Moreover, %these techniques' primary assumption is that they abstract the log to activities and timestamps, and miss the opportunity to use data attributes. % In this paper, we lift these assumptions and propose a new technique called EC-SA-Data based on probabilistic optimization. The technique takes as inputs a sequence of timestamped events (the log without case IDs), a process model describing the underlying business process, and constraints over the event attributes. Our approach returns an event log in which every event is associated with a case identifier. The technique allows users to incorporate rules on process knowledge and data constraints flexibly. The approach minimizes the misalignment between the generated log and the input process model, maximizes the support of the given data constraints over the correlated log, and the variance between activity durations across cases. Our experiments with various real-life datasets show the advantages of our approach over the state of the art.

查看原文本刊更多论文

基于概率优化的过程挖掘的事件-案例关联

流程挖掘支持使用事件日志分析业务流程的实际行为和性能。%例如，ERP系统记录的销售交易。一个基本要求是，日志中的每个事件必须与唯一的案例标识符相关联(例如，订单到现金流程的订单ID)。然而，在现实中，这种情况标识符可能并不总是存在，特别是当从不同的系统获取日志或从非进程感知的信息系统提取日志时。在这种设置中，需要通过将事件分组到案例中来预处理事件日志——这一操作称为事件关联。现有的事件关联技术已经在假设的基础上工作，使问题易于处理:一些假设生成过程是无循环的，而另一些则需要启发式信息或用户输入。此外，这些技术的主要假设是它们将日志抽象为活动和时间戳，从而错失了使用数据属性的机会。。在本文中，我们取消了这些假设，并提出了一种基于概率优化的新技术，称为EC-SA-Data。该技术将一系列带有时间戳的事件(没有案例id的日志)、描述底层业务流程的流程模型以及事件属性的约束作为输入。我们的方法返回一个事件日志，其中每个事件都与大小写标识符相关联。该技术允许用户灵活地结合过程知识和数据约束的规则。该方法最大限度地减少了生成的日志和输入流程模型之间的不一致，最大限度地提高了对相关日志上给定数据约束的支持，以及不同情况下活动持续时间之间的差异。我们对各种真实数据集的实验表明，我们的方法优于目前的技术水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Mob. Inf. Syst.

自引率

0.00%

发文量