GUIDO: A Hybrid Approach to Guideline Discovery & Ordering from Natural Language Texts

IF 2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Pub Date : 2023-07-19 DOI:10.5220/0012084400003541

Nils Freyer, Dustin Thewes, Matthias Meinecke

{"title":"GUIDO: A Hybrid Approach to Guideline Discovery & Ordering from Natural Language Texts","authors":"Nils Freyer, Dustin Thewes, Matthias Meinecke","doi":"10.5220/0012084400003541","DOIUrl":null,"url":null,"abstract":"Extracting workflow nets from textual descriptions can be used to simplify guidelines or formalize textual descriptions of formal processes like business processes and algorithms. The task of manually extracting processes, however, requires domain expertise and effort. While automatic process model extraction is desirable, annotating texts with formalized process models is expensive. Therefore, there are only a few machine-learning-based extraction approaches. Rule-based approaches, in turn, require domain specificity to work well and can rarely distinguish relevant and irrelevant information in textual descriptions. In this paper, we present GUIDO, a hybrid approach to the process model extraction task that first, classifies sentences regarding their relevance to the process model, using a BERT-based sentence classifier, and second, extracts a process model from the sentences classified as relevant, using dependency parsing. The presented approach achieves significantly better results than a pure rule-based approach. GUIDO achieves an average behavioral similarity score of $0.93$. Still, in comparison to purely machine-learning-based approaches, the annotation costs stay low.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"335-342"},"PeriodicalIF":2.0000,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.5220/0012084400003541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Extracting workflow nets from textual descriptions can be used to simplify guidelines or formalize textual descriptions of formal processes like business processes and algorithms. The task of manually extracting processes, however, requires domain expertise and effort. While automatic process model extraction is desirable, annotating texts with formalized process models is expensive. Therefore, there are only a few machine-learning-based extraction approaches. Rule-based approaches, in turn, require domain specificity to work well and can rarely distinguish relevant and irrelevant information in textual descriptions. In this paper, we present GUIDO, a hybrid approach to the process model extraction task that first, classifies sentences regarding their relevance to the process model, using a BERT-based sentence classifier, and second, extracts a process model from the sentences classified as relevant, using dependency parsing. The presented approach achieves significantly better results than a pure rule-based approach. GUIDO achieves an average behavioral similarity score of $0.93$. Still, in comparison to purely machine-learning-based approaches, the annotation costs stay low.

查看原文本刊更多论文

GUIDO:从自然语言文本中发现和排序指南的混合方法

从文本描述中提取工作流网络可用于简化指导方针或形式化正式流程(如业务流程和算法)的文本描述。然而，手动提取流程的任务需要领域的专业知识和努力。虽然需要自动提取过程模型，但是用形式化的过程模型对文本进行注释是非常昂贵的。因此，只有少数几种基于机器学习的提取方法。反过来，基于规则的方法需要领域特异性才能很好地工作，并且很少能够区分文本描述中的相关和不相关信息。在本文中，我们提出了GUIDO，这是一种过程模型提取任务的混合方法，首先，使用基于bert的句子分类器根据与过程模型的相关性对句子进行分类，然后使用依赖解析从分类为相关的句子中提取过程模型。所提出的方法比纯基于规则的方法取得了明显更好的结果。GUIDO的平均行为相似性得分为0.93美元。尽管如此，与纯粹基于机器学习的方法相比，注释的成本仍然很低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊