Integrated detection and localization of concept drifts in process mining with batch and stream trace clustering support

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering Pub Date : 2023-12-02 DOI:10.1016/j.datak.2023.102253

Rafael Gaspar de Sousa , Antonio Carlos Meira Neto , Marcelo Fantinato , Sarajane Marques Peres , Hajo Alexander Reijers

{"title":"Integrated detection and localization of concept drifts in process mining with batch and stream trace clustering support","authors":"Rafael Gaspar de Sousa , Antonio Carlos Meira Neto , Marcelo Fantinato , Sarajane Marques Peres , Hajo Alexander Reijers","doi":"10.1016/j.datak.2023.102253","DOIUrl":null,"url":null,"abstract":"<div>Process mining can help organizations by extracting knowledge from event logs. However, process mining techniques often assume business processes are stationary, while actual business processes are constantly subject to change because of the complexity of organizations and their external environment. Thus, addressing process changes over time – known as concept drifts – allows for a better understanding of process behavior and can provide a competitive edge for organizations, especially in an online data stream scenario. Current approaches to handling process concept drift focus primarily on detecting and locating concept drifts, often through an integrated, albeit offline, approach. However, part of these integrated approaches rely on complex data structures related to tree-based process models, usually discovered through algorithms whose results are influenced by specific heuristic rules. Moreover, most of the proposed approaches have not been tested on public true concept drift-labeled event logs commonly used as benchmark, making comparative analysis difficult. In this article, we propose an online approach to detect and localize concept drifts in an integrated way using batch and stream trace clustering support. In our approach, cluster models provide input information for both concept drift detection and localization methods. Each cluster abstracts a behavior profile underlying the process and reveals descriptive information about the discovered concept drifts. Experiments with benchmark synthetic event logs with different control-flow changes, as well as with real-world event logs, showed that our approach, when relying on the same clustering model, is competitive in relation to baselines concept drift detection method. In addition, the experiment showed our approach is able to correctly locate the concept drifts detected and allows the analysis of such concept drifts through different process behavior profiles.</div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"149 ","pages":"Article 102253"},"PeriodicalIF":2.7000,"publicationDate":"2023-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X23001131","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Process mining can help organizations by extracting knowledge from event logs. However, process mining techniques often assume business processes are stationary, while actual business processes are constantly subject to change because of the complexity of organizations and their external environment. Thus, addressing process changes over time – known as concept drifts – allows for a better understanding of process behavior and can provide a competitive edge for organizations, especially in an online data stream scenario. Current approaches to handling process concept drift focus primarily on detecting and locating concept drifts, often through an integrated, albeit offline, approach. However, part of these integrated approaches rely on complex data structures related to tree-based process models, usually discovered through algorithms whose results are influenced by specific heuristic rules. Moreover, most of the proposed approaches have not been tested on public true concept drift-labeled event logs commonly used as benchmark, making comparative analysis difficult. In this article, we propose an online approach to detect and localize concept drifts in an integrated way using batch and stream trace clustering support. In our approach, cluster models provide input information for both concept drift detection and localization methods. Each cluster abstracts a behavior profile underlying the process and reveals descriptive information about the discovered concept drifts. Experiments with benchmark synthetic event logs with different control-flow changes, as well as with real-world event logs, showed that our approach, when relying on the same clustering model, is competitive in relation to baselines concept drift detection method. In addition, the experiment showed our approach is able to correctly locate the concept drifts detected and allows the analysis of such concept drifts through different process behavior profiles.

查看原文本刊更多论文

基于批和流轨迹聚类支持的过程挖掘中概念漂移的集成检测与定位

流程挖掘可以通过从事件日志中提取知识来帮助组织。然而，流程挖掘技术通常假设业务流程是固定的，而实际的业务流程由于组织及其外部环境的复杂性而不断变化。因此，处理随时间变化的过程变化——称为概念漂移——允许更好地理解过程行为，并且可以为组织提供竞争优势，特别是在在线数据流场景中。目前处理过程概念漂移的方法主要集中于检测和定位概念漂移，通常是通过一种集成的(尽管是离线的)方法。然而，这些集成方法的一部分依赖于与基于树的过程模型相关的复杂数据结构，通常通过算法发现，其结果受特定启发式规则的影响。此外，大多数提出的方法尚未在通常用作基准的公开真实概念漂移标记事件日志上进行测试，这使得比较分析变得困难。在本文中，我们提出了一种在线方法，利用批处理和流跟踪聚类支持，以集成的方式检测和定位概念漂移。在我们的方法中，聚类模型为概念漂移检测和定位方法提供输入信息。每个聚类都抽象出一个过程底层的行为概况，并揭示关于发现的概念漂移的描述性信息。对具有不同控制流变化的基准合成事件日志以及实际事件日志进行的实验表明，当依赖于相同的聚类模型时，我们的方法与基线概念漂移检测方法相比具有竞争力。此外，实验表明，我们的方法能够正确定位检测到的概念漂移，并允许通过不同的过程行为配置文件分析这种概念漂移。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.