GUIDO:从自然语言文本中发现和排序指南的混合方法

IF 2.2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Data Pub Date : 2023-07-19 DOI:10.5220/0012084400003541
Nils Freyer, Dustin Thewes, Matthias Meinecke
{"title":"GUIDO:从自然语言文本中发现和排序指南的混合方法","authors":"Nils Freyer, Dustin Thewes, Matthias Meinecke","doi":"10.5220/0012084400003541","DOIUrl":null,"url":null,"abstract":"Extracting workflow nets from textual descriptions can be used to simplify guidelines or formalize textual descriptions of formal processes like business processes and algorithms. The task of manually extracting processes, however, requires domain expertise and effort. While automatic process model extraction is desirable, annotating texts with formalized process models is expensive. Therefore, there are only a few machine-learning-based extraction approaches. Rule-based approaches, in turn, require domain specificity to work well and can rarely distinguish relevant and irrelevant information in textual descriptions. In this paper, we present GUIDO, a hybrid approach to the process model extraction task that first, classifies sentences regarding their relevance to the process model, using a BERT-based sentence classifier, and second, extracts a process model from the sentences classified as relevant, using dependency parsing. The presented approach achieves significantly better results than a pure rule-based approach. GUIDO achieves an average behavioral similarity score of $0.93$. Still, in comparison to purely machine-learning-based approaches, the annotation costs stay low.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"335-342"},"PeriodicalIF":2.2000,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GUIDO: A Hybrid Approach to Guideline Discovery & Ordering from Natural Language Texts\",\"authors\":\"Nils Freyer, Dustin Thewes, Matthias Meinecke\",\"doi\":\"10.5220/0012084400003541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extracting workflow nets from textual descriptions can be used to simplify guidelines or formalize textual descriptions of formal processes like business processes and algorithms. The task of manually extracting processes, however, requires domain expertise and effort. While automatic process model extraction is desirable, annotating texts with formalized process models is expensive. Therefore, there are only a few machine-learning-based extraction approaches. Rule-based approaches, in turn, require domain specificity to work well and can rarely distinguish relevant and irrelevant information in textual descriptions. In this paper, we present GUIDO, a hybrid approach to the process model extraction task that first, classifies sentences regarding their relevance to the process model, using a BERT-based sentence classifier, and second, extracts a process model from the sentences classified as relevant, using dependency parsing. The presented approach achieves significantly better results than a pure rule-based approach. GUIDO achieves an average behavioral similarity score of $0.93$. Still, in comparison to purely machine-learning-based approaches, the annotation costs stay low.\",\"PeriodicalId\":36824,\"journal\":{\"name\":\"Data\",\"volume\":\"1 1\",\"pages\":\"335-342\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.5220/0012084400003541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.5220/0012084400003541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

从文本描述中提取工作流网络可用于简化指导方针或形式化正式流程(如业务流程和算法)的文本描述。然而,手动提取流程的任务需要领域的专业知识和努力。虽然需要自动提取过程模型,但是用形式化的过程模型对文本进行注释是非常昂贵的。因此,只有少数几种基于机器学习的提取方法。反过来,基于规则的方法需要领域特异性才能很好地工作,并且很少能够区分文本描述中的相关和不相关信息。在本文中,我们提出了GUIDO,这是一种过程模型提取任务的混合方法,首先,使用基于bert的句子分类器根据与过程模型的相关性对句子进行分类,然后使用依赖解析从分类为相关的句子中提取过程模型。所提出的方法比纯基于规则的方法取得了明显更好的结果。GUIDO的平均行为相似性得分为0.93美元。尽管如此,与纯粹基于机器学习的方法相比,注释的成本仍然很低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GUIDO: A Hybrid Approach to Guideline Discovery & Ordering from Natural Language Texts
Extracting workflow nets from textual descriptions can be used to simplify guidelines or formalize textual descriptions of formal processes like business processes and algorithms. The task of manually extracting processes, however, requires domain expertise and effort. While automatic process model extraction is desirable, annotating texts with formalized process models is expensive. Therefore, there are only a few machine-learning-based extraction approaches. Rule-based approaches, in turn, require domain specificity to work well and can rarely distinguish relevant and irrelevant information in textual descriptions. In this paper, we present GUIDO, a hybrid approach to the process model extraction task that first, classifies sentences regarding their relevance to the process model, using a BERT-based sentence classifier, and second, extracts a process model from the sentences classified as relevant, using dependency parsing. The presented approach achieves significantly better results than a pure rule-based approach. GUIDO achieves an average behavioral similarity score of $0.93$. Still, in comparison to purely machine-learning-based approaches, the annotation costs stay low.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Data
Data Decision Sciences-Information Systems and Management
CiteScore
4.30
自引率
3.80%
发文量
0
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信