TeProM: A rule-free method for extracting process from complex text with enhanced coreference handling

IF 6.8 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-06-26 DOI:10.1016/j.ins.2025.122451

Xiaoxiao Sun , Chenying Zhao , Dongjin Yu , Yi Xu , Nana Xiao

{"title":"TeProM: A rule-free method for extracting process from complex text with enhanced coreference handling","authors":"Xiaoxiao Sun , Chenying Zhao , Dongjin Yu , Yi Xu , Nana Xiao","doi":"10.1016/j.ins.2025.122451","DOIUrl":null,"url":null,"abstract":"<div><div>Extracting business process models from textual documents remains a significant challenge in enterprises. Traditional rule-based methods suffer from poor applicability due to customized rule sets while most machine-learning based methods focus on simple process documents. This paper presents Text-based Process Modeling (TeProM), a novel method for extracting business process components and their relations from textual descriptions. By adopting a rule-free design, TeProM departs from traditional rule-based systems and leverages a neural network model to address complex coreference phenomena in text, thereby ensuring the accurate mapping of process components within the model. This approach applies to various types of business process documents, particularly excelling in processing complex textual structures with long-range dependencies. Compared to previous approaches, TeProM is able to effectively address the complex logical structures and coreference issues concealed in business process documents. TeProM achieved the best performance over 10 baselines in multidimensional evaluation. Additionally, evaluations on the PET and SAP-OPC datasets for relation extraction further demonstrated the effectiveness of the proposed method. An annotated dataset consisting of 91 real business process documents is also provided, which serves as a valuable resource for future research.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"719 ","pages":"Article 122451"},"PeriodicalIF":6.8000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525005833","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Extracting business process models from textual documents remains a significant challenge in enterprises. Traditional rule-based methods suffer from poor applicability due to customized rule sets while most machine-learning based methods focus on simple process documents. This paper presents Text-based Process Modeling (TeProM), a novel method for extracting business process components and their relations from textual descriptions. By adopting a rule-free design, TeProM departs from traditional rule-based systems and leverages a neural network model to address complex coreference phenomena in text, thereby ensuring the accurate mapping of process components within the model. This approach applies to various types of business process documents, particularly excelling in processing complex textual structures with long-range dependencies. Compared to previous approaches, TeProM is able to effectively address the complex logical structures and coreference issues concealed in business process documents. TeProM achieved the best performance over 10 baselines in multidimensional evaluation. Additionally, evaluations on the PET and SAP-OPC datasets for relation extraction further demonstrated the effectiveness of the proposed method. An annotated dataset consisting of 91 real business process documents is also provided, which serves as a valuable resource for future research.

查看原文本刊更多论文

TeProM：一种无规则的方法，用于从具有增强的共同引用处理的复杂文本中提取过程

从文本文档中提取业务流程模型仍然是企业面临的一个重大挑战。传统的基于规则的方法由于自定义规则集而适用性差，而大多数基于机器学习的方法则侧重于简单的过程文档。基于文本的流程建模（TeProM）是一种从文本描述中提取业务流程组件及其关系的新方法。通过采用无规则设计，TeProM脱离了传统的基于规则的系统，利用神经网络模型来解决文本中复杂的共引用现象，从而确保模型中流程组件的准确映射。这种方法适用于各种类型的业务流程文档，尤其擅长处理具有长期依赖关系的复杂文本结构。与以前的方法相比，TeProM能够有效地处理隐藏在业务流程文档中的复杂逻辑结构和相互引用问题。在多维评价中，TeProM在超过10条基线上取得了最佳性能。此外，对PET和SAP-OPC数据集进行关系提取的评价进一步证明了该方法的有效性。还提供了一个由91个实际业务流程文档组成的带注释的数据集，为未来的研究提供了宝贵的资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.