Xiaoxiao Sun , Chenying Zhao , Dongjin Yu , Yi Xu , Nana Xiao
{"title":"TeProM: A rule-free method for extracting process from complex text with enhanced coreference handling","authors":"Xiaoxiao Sun , Chenying Zhao , Dongjin Yu , Yi Xu , Nana Xiao","doi":"10.1016/j.ins.2025.122451","DOIUrl":null,"url":null,"abstract":"<div><div>Extracting business process models from textual documents remains a significant challenge in enterprises. Traditional rule-based methods suffer from poor applicability due to customized rule sets while most machine-learning based methods focus on simple process documents. This paper presents Text-based Process Modeling (TeProM), a novel method for extracting business process components and their relations from textual descriptions. By adopting a rule-free design, TeProM departs from traditional rule-based systems and leverages a neural network model to address complex coreference phenomena in text, thereby ensuring the accurate mapping of process components within the model. This approach applies to various types of business process documents, particularly excelling in processing complex textual structures with long-range dependencies. Compared to previous approaches, TeProM is able to effectively address the complex logical structures and coreference issues concealed in business process documents. TeProM achieved the best performance over 10 baselines in multidimensional evaluation. Additionally, evaluations on the PET and SAP-OPC datasets for relation extraction further demonstrated the effectiveness of the proposed method. An annotated dataset consisting of 91 real business process documents is also provided, which serves as a valuable resource for future research.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"719 ","pages":"Article 122451"},"PeriodicalIF":6.8000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525005833","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Extracting business process models from textual documents remains a significant challenge in enterprises. Traditional rule-based methods suffer from poor applicability due to customized rule sets while most machine-learning based methods focus on simple process documents. This paper presents Text-based Process Modeling (TeProM), a novel method for extracting business process components and their relations from textual descriptions. By adopting a rule-free design, TeProM departs from traditional rule-based systems and leverages a neural network model to address complex coreference phenomena in text, thereby ensuring the accurate mapping of process components within the model. This approach applies to various types of business process documents, particularly excelling in processing complex textual structures with long-range dependencies. Compared to previous approaches, TeProM is able to effectively address the complex logical structures and coreference issues concealed in business process documents. TeProM achieved the best performance over 10 baselines in multidimensional evaluation. Additionally, evaluations on the PET and SAP-OPC datasets for relation extraction further demonstrated the effectiveness of the proposed method. An annotated dataset consisting of 91 real business process documents is also provided, which serves as a valuable resource for future research.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.