Comprehensive characterization of concept drifts in process mining

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2025-07-19 DOI:10.1016/j.is.2025.102584

Alexander Kraus , Han van der Aa

{"title":"Comprehensive characterization of concept drifts in process mining","authors":"Alexander Kraus , Han van der Aa","doi":"10.1016/j.is.2025.102584","DOIUrl":null,"url":null,"abstract":"<div><div>Business processes are subject to changes due to the dynamic environments in which they are executed. These process changes can lead to concept drifts, which are situations when the characteristics of a business process have undergone significant changes, resulting in event logs that contain data on different versions of a process. The accuracy and usefulness of process mining results derived from such event logs may be compromised because they rely on historical data that no longer reflects the current process behavior, or because the results do not distinguish between different process versions. Therefore, concept drift detection in process mining aims to identify drifts recorded in an event log by detecting when they occurred, localizing process modifications, and characterizing how they manifest over time. This paper focuses on the latter task, i.e., drift characterization, which seeks to understand whether changes unfolded suddenly or gradually and if they form complex patterns like incremental or recurring drifts. However, current solutions for automatically detecting concept drifts from event logs lack comprehensive characterization capabilities. Instead, they mainly focus on drift detection and characterization of isolated process changes. This leads to an incomplete understanding of more complex concept drifts, like incremental and recurring drifts, when several process changes are inter-connected. This paper overcomes such limitations by introducing an improved taxonomy for characterizing concept drifts and a three-step framework that provides an automatic characterization of concept drifts from event logs. We evaluated our framework through elaborate evaluation experiments conducted using a large collection of synthetic event logs. The results highlight the effectiveness and accuracy of our proposed framework and show that it outperforms state-of-the-art techniques.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"135 ","pages":"Article 102584"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000687","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Business processes are subject to changes due to the dynamic environments in which they are executed. These process changes can lead to concept drifts, which are situations when the characteristics of a business process have undergone significant changes, resulting in event logs that contain data on different versions of a process. The accuracy and usefulness of process mining results derived from such event logs may be compromised because they rely on historical data that no longer reflects the current process behavior, or because the results do not distinguish between different process versions. Therefore, concept drift detection in process mining aims to identify drifts recorded in an event log by detecting when they occurred, localizing process modifications, and characterizing how they manifest over time. This paper focuses on the latter task, i.e., drift characterization, which seeks to understand whether changes unfolded suddenly or gradually and if they form complex patterns like incremental or recurring drifts. However, current solutions for automatically detecting concept drifts from event logs lack comprehensive characterization capabilities. Instead, they mainly focus on drift detection and characterization of isolated process changes. This leads to an incomplete understanding of more complex concept drifts, like incremental and recurring drifts, when several process changes are inter-connected. This paper overcomes such limitations by introducing an improved taxonomy for characterizing concept drifts and a three-step framework that provides an automatic characterization of concept drifts from event logs. We evaluated our framework through elaborate evaluation experiments conducted using a large collection of synthetic event logs. The results highlight the effectiveness and accuracy of our proposed framework and show that it outperforms state-of-the-art techniques.

查看原文本刊更多论文

工艺采矿中概念漂移的综合表征

由于执行业务流程的动态环境，业务流程可能会发生变化。这些流程变更可能导致概念漂移，即业务流程的特征发生了重大变更，从而导致事件日志中包含流程不同版本的数据。源自此类事件日志的流程挖掘结果的准确性和有用性可能会受到损害，因为它们依赖于不再反映当前流程行为的历史数据，或者因为结果不能区分不同的流程版本。因此，流程挖掘中的概念漂移检测旨在通过检测何时发生、定位流程修改以及描述它们随时间的表现方式来识别记录在事件日志中的漂移。本文侧重于后一项任务，即漂移表征，旨在了解变化是突然展开还是逐渐展开，以及它们是否形成像增量或重复漂移这样的复杂模式。然而，目前用于从事件日志中自动检测概念漂移的解决方案缺乏全面的特征描述功能。相反，他们主要关注漂移检测和孤立过程变化的表征。这导致了对更复杂的概念漂移的不完全理解，比如增量和重复漂移，当几个过程变化是相互联系的。本文通过引入用于描述概念漂移的改进分类法和提供从事件日志自动描述概念漂移的三步框架，克服了这些限制。我们通过使用大量合成事件日志进行的详细评估实验来评估我们的框架。结果突出了我们提出的框架的有效性和准确性，并表明它优于最先进的技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.