{"title":"Classification and quantification of timestamp data quality issues and its impact on data quality outcome","authors":"Rex Ambe","doi":"10.1162/dint_a_00238","DOIUrl":null,"url":null,"abstract":"\n Timestamps play a key role in process mining because it determines the chronology of which events occurred and subsequently how they are ordered in process modelling. The timestamp in process mining gives an insight on process performance, conformance, and modelling. This therefore means problems with the timestamp will result in misrepresentations of the mined process. A few articles have been published on the quantification of data quality problems but just one of the articles at the time of this paper is based on the quantification of timestamp quality problems. This article evaluates the quality of timestamps in event log across two axes using eleven quality dimensions and four levels of potential data quality problems. The eleven data quality dimensions were obtained by doing a thorough literature review of more than fifty process mining articles which focus on quality dimensions. This evaluation resulted in twelve data quality quantification metrics and the metrics were applied to the MIMIC-III dataset as an illustration. The outcome of the timestamp quality quantification using the proposed typology enabled the user to appreciate the quality of the event log and thus makes it possible to evaluate the risk of carrying out specific data cleaning measures to improve the process mining outcome.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"123 2","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/dint_a_00238","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Timestamps play a key role in process mining because it determines the chronology of which events occurred and subsequently how they are ordered in process modelling. The timestamp in process mining gives an insight on process performance, conformance, and modelling. This therefore means problems with the timestamp will result in misrepresentations of the mined process. A few articles have been published on the quantification of data quality problems but just one of the articles at the time of this paper is based on the quantification of timestamp quality problems. This article evaluates the quality of timestamps in event log across two axes using eleven quality dimensions and four levels of potential data quality problems. The eleven data quality dimensions were obtained by doing a thorough literature review of more than fifty process mining articles which focus on quality dimensions. This evaluation resulted in twelve data quality quantification metrics and the metrics were applied to the MIMIC-III dataset as an illustration. The outcome of the timestamp quality quantification using the proposed typology enabled the user to appreciate the quality of the event log and thus makes it possible to evaluate the risk of carrying out specific data cleaning measures to improve the process mining outcome.