Text2EL+:专家指导下使用非结构化文本丰富事件日志

IF 1.5 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
D. T. K. Geeganage, M. Wynn, A. Hofstede
{"title":"Text2EL+:专家指导下使用非结构化文本丰富事件日志","authors":"D. T. K. Geeganage, M. Wynn, A. Hofstede","doi":"10.1145/3640018","DOIUrl":null,"url":null,"abstract":"Through the application of process mining, business processes can be improved on the basis of process execution data captured in event logs. Naturally, the quality of this data determines the quality of the improvement recommendations. Improving data quality is non-trivial and there is great potential to exploit unstructured text, e.g. from notes, reviews, and comments, for this purpose and to enrich event logs. To this end, this paper introduces Text2EL+ a three-phase approach to enrich event logs using unstructured text. In its first phase, events and (case and event) attributes are derived from unstructured text linked to organisational processes. In its second phase, these events and attributes undergo a semantic and contextual validation before their incorporation in the event log. In its third and final phase, recognising the importance of human domain expertise, expert guidance is used to further improve data quality by removing redundant and irrelevant events. Expert input is used to train a Named Entity Recognition (NER) model with customised tags to detect event log elements. The approach applies natural language processing techniques, sentence embeddings, training pipelines and models, as well as contextual and expression validation. Various unstructured clinical notes associated with a healthcare case study were analysed and completeness, concordance, and correctness of the derived event log elements were evaluated through experiments. The results show that the proposed method is feasible and applicable.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"5 8","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text\",\"authors\":\"D. T. K. Geeganage, M. Wynn, A. Hofstede\",\"doi\":\"10.1145/3640018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Through the application of process mining, business processes can be improved on the basis of process execution data captured in event logs. Naturally, the quality of this data determines the quality of the improvement recommendations. Improving data quality is non-trivial and there is great potential to exploit unstructured text, e.g. from notes, reviews, and comments, for this purpose and to enrich event logs. To this end, this paper introduces Text2EL+ a three-phase approach to enrich event logs using unstructured text. In its first phase, events and (case and event) attributes are derived from unstructured text linked to organisational processes. In its second phase, these events and attributes undergo a semantic and contextual validation before their incorporation in the event log. In its third and final phase, recognising the importance of human domain expertise, expert guidance is used to further improve data quality by removing redundant and irrelevant events. Expert input is used to train a Named Entity Recognition (NER) model with customised tags to detect event log elements. The approach applies natural language processing techniques, sentence embeddings, training pipelines and models, as well as contextual and expression validation. Various unstructured clinical notes associated with a healthcare case study were analysed and completeness, concordance, and correctness of the derived event log elements were evaluated through experiments. The results show that the proposed method is feasible and applicable.\",\"PeriodicalId\":44355,\"journal\":{\"name\":\"ACM Journal of Data and Information Quality\",\"volume\":\"5 8\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Journal of Data and Information Quality\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3640018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3640018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

通过应用流程挖掘,可以根据事件日志中捕获的流程执行数据改进业务流程。当然,这些数据的质量决定了改进建议的质量。提高数据质量并非易事,为此,利用非结构化文本(如来自注释、评论和意见的文本)来丰富事件日志的内容大有可为。为此,本文介绍了 Text2EL+ 这种利用非结构化文本丰富事件日志的三阶段方法。在第一阶段,从与组织流程相关联的非结构化文本中提取事件和(案例和事件)属性。在第二阶段,这些事件和属性在纳入事件日志之前要经过语义和上下文验证。在第三阶段,也是最后一个阶段,由于认识到人类领域专业知识的重要性,专家指导被用来去除冗余和不相关的事件,从而进一步提高数据质量。专家的输入被用来训练一个带有定制标签的命名实体识别(NER)模型,以检测事件日志元素。该方法应用了自然语言处理技术、句子嵌入、训练管道和模型,以及上下文和表达验证。对与医疗案例研究相关的各种非结构化临床笔记进行了分析,并通过实验评估了衍生事件日志元素的完整性、一致性和正确性。结果表明,所提出的方法是可行和适用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text
Through the application of process mining, business processes can be improved on the basis of process execution data captured in event logs. Naturally, the quality of this data determines the quality of the improvement recommendations. Improving data quality is non-trivial and there is great potential to exploit unstructured text, e.g. from notes, reviews, and comments, for this purpose and to enrich event logs. To this end, this paper introduces Text2EL+ a three-phase approach to enrich event logs using unstructured text. In its first phase, events and (case and event) attributes are derived from unstructured text linked to organisational processes. In its second phase, these events and attributes undergo a semantic and contextual validation before their incorporation in the event log. In its third and final phase, recognising the importance of human domain expertise, expert guidance is used to further improve data quality by removing redundant and irrelevant events. Expert input is used to train a Named Entity Recognition (NER) model with customised tags to detect event log elements. The approach applies natural language processing techniques, sentence embeddings, training pipelines and models, as well as contextual and expression validation. Various unstructured clinical notes associated with a healthcare case study were analysed and completeness, concordance, and correctness of the derived event log elements were evaluated through experiments. The results show that the proposed method is feasible and applicable.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Journal of Data and Information Quality
ACM Journal of Data and Information Quality COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
4.10
自引率
4.80%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信