Foundations and practice of binary process discovery

IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Tijs Slaats , Søren Debois , Christoffer Olling Back , Axel Kjeld Fjelrad Christfort
{"title":"Foundations and practice of binary process discovery","authors":"Tijs Slaats ,&nbsp;Søren Debois ,&nbsp;Christoffer Olling Back ,&nbsp;Axel Kjeld Fjelrad Christfort","doi":"10.1016/j.is.2023.102339","DOIUrl":null,"url":null,"abstract":"<div><p>Most contemporary process discovery methods take as inputs only <em>positive</em> examples of process executions, and so they are <em>one-class classification</em> algorithms. However, we have found <em>negative</em> examples to also be available in industry, hence we build on earlier work that treats process discovery as a <em>binary classification</em> problem. This approach opens the door to many well-established methods and metrics from machine learning, in particular to improve the distinction between what should and should not be allowed by the output model. Concretely, we (1) present a verified formalisation of process discovery as a binary classification problem; (2) provide cases with negative examples from industry, including real-life logs; (3) propose the Rejection Miner binary classification procedure, applicable to any process notation that has a suitable syntactic composition operator; (4) implement two concrete binary miners, one outputting Declare patterns, the other Dynamic Condition Response (DCR) graphs; and (5) apply these miners to real world and synthetic logs obtained from our industry partners and the process discovery contest, showing increased output model quality in terms of accuracy and model size.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"121 ","pages":"Article 102339"},"PeriodicalIF":3.0000,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437923001758/pdfft?md5=f2bf1fcd001426b54f1d43f5ac2ad3d9&pid=1-s2.0-S0306437923001758-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437923001758","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Most contemporary process discovery methods take as inputs only positive examples of process executions, and so they are one-class classification algorithms. However, we have found negative examples to also be available in industry, hence we build on earlier work that treats process discovery as a binary classification problem. This approach opens the door to many well-established methods and metrics from machine learning, in particular to improve the distinction between what should and should not be allowed by the output model. Concretely, we (1) present a verified formalisation of process discovery as a binary classification problem; (2) provide cases with negative examples from industry, including real-life logs; (3) propose the Rejection Miner binary classification procedure, applicable to any process notation that has a suitable syntactic composition operator; (4) implement two concrete binary miners, one outputting Declare patterns, the other Dynamic Condition Response (DCR) graphs; and (5) apply these miners to real world and synthetic logs obtained from our industry partners and the process discovery contest, showing increased output model quality in terms of accuracy and model size.

二进制过程发现的基础与实践
大多数当代流程发现方法仅将流程执行的正面示例作为输入,因此属于单类分类算法。然而,我们发现工业中也有负面示例,因此我们在早期工作的基础上,将流程发现视为二元分类问题。这种方法为机器学习中许多成熟的方法和指标打开了大门,特别是改进了输出模型应该允许和不应该允许的内容之间的区别。具体来说,我们(1)将流程发现形式化为一个二元分类问题,并进行了验证;(2)提供了来自行业的负面案例,包括现实生活中的日志;(3)提出了拒绝矿工二元分类程序,该程序适用于任何具有合适语法组成算子的流程符号;(4) 实现两个具体的二进制矿工,一个输出声明模式,另一个输出动态条件响应(DCR)图;以及 (5) 将这些矿工应用于从我们的行业合作伙伴和流程发现竞赛中获得的真实世界和合成日志,结果显示在准确性和模型大小方面输出模型的质量都有所提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Systems
Information Systems 工程技术-计算机:信息系统
CiteScore
9.40
自引率
2.70%
发文量
112
审稿时长
53 days
期刊介绍: Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信