{"title":"Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text","authors":"D. T. K. Geeganage, M. Wynn, A. Hofstede","doi":"10.1145/3640018","DOIUrl":null,"url":null,"abstract":"Through the application of process mining, business processes can be improved on the basis of process execution data captured in event logs. Naturally, the quality of this data determines the quality of the improvement recommendations. Improving data quality is non-trivial and there is great potential to exploit unstructured text, e.g. from notes, reviews, and comments, for this purpose and to enrich event logs. To this end, this paper introduces Text2EL+ a three-phase approach to enrich event logs using unstructured text. In its first phase, events and (case and event) attributes are derived from unstructured text linked to organisational processes. In its second phase, these events and attributes undergo a semantic and contextual validation before their incorporation in the event log. In its third and final phase, recognising the importance of human domain expertise, expert guidance is used to further improve data quality by removing redundant and irrelevant events. Expert input is used to train a Named Entity Recognition (NER) model with customised tags to detect event log elements. The approach applies natural language processing techniques, sentence embeddings, training pipelines and models, as well as contextual and expression validation. Various unstructured clinical notes associated with a healthcare case study were analysed and completeness, concordance, and correctness of the derived event log elements were evaluated through experiments. The results show that the proposed method is feasible and applicable.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"5 8","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3640018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Through the application of process mining, business processes can be improved on the basis of process execution data captured in event logs. Naturally, the quality of this data determines the quality of the improvement recommendations. Improving data quality is non-trivial and there is great potential to exploit unstructured text, e.g. from notes, reviews, and comments, for this purpose and to enrich event logs. To this end, this paper introduces Text2EL+ a three-phase approach to enrich event logs using unstructured text. In its first phase, events and (case and event) attributes are derived from unstructured text linked to organisational processes. In its second phase, these events and attributes undergo a semantic and contextual validation before their incorporation in the event log. In its third and final phase, recognising the importance of human domain expertise, expert guidance is used to further improve data quality by removing redundant and irrelevant events. Expert input is used to train a Named Entity Recognition (NER) model with customised tags to detect event log elements. The approach applies natural language processing techniques, sentence embeddings, training pipelines and models, as well as contextual and expression validation. Various unstructured clinical notes associated with a healthcare case study were analysed and completeness, concordance, and correctness of the derived event log elements were evaluated through experiments. The results show that the proposed method is feasible and applicable.