{"title":"A survey of open-source data quality tools: shedding light on the materialization of data quality dimensions in practice","authors":"Vasileios Papastergios, Anastasios Gounaris","doi":"arxiv-2407.18649","DOIUrl":null,"url":null,"abstract":"Data Quality (DQ) describes the degree to which data characteristics meet\nrequirements and are fit for use by humans and/or systems. There are several\naspects in which DQ can be measured, called DQ dimensions (i.e. accuracy,\ncompleteness, consistency, etc.), also referred to as characteristics in\nliterature. ISO/IEC 25012 Standard defines a data quality model with fifteen\nsuch dimensions, setting the requirements a data product should meet. In this\nshort report, we aim to bridge the gap between lower-level functionalities\noffered by DQ tools and higher-level dimensions in a systematic manner,\nrevealing the many-to-many relationships between them. To this end, we examine\n6 open-source DQ tools and we emphasize on providing a mapping between the\nfunctionalities they offer and the DQ dimensions, as defined by the ISO\nstandard. Wherever applicable, we also provide insights into the software\nengineering details that tools leverage, in order to address DQ challenges.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.18649","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data Quality (DQ) describes the degree to which data characteristics meet
requirements and are fit for use by humans and/or systems. There are several
aspects in which DQ can be measured, called DQ dimensions (i.e. accuracy,
completeness, consistency, etc.), also referred to as characteristics in
literature. ISO/IEC 25012 Standard defines a data quality model with fifteen
such dimensions, setting the requirements a data product should meet. In this
short report, we aim to bridge the gap between lower-level functionalities
offered by DQ tools and higher-level dimensions in a systematic manner,
revealing the many-to-many relationships between them. To this end, we examine
6 open-source DQ tools and we emphasize on providing a mapping between the
functionalities they offer and the DQ dimensions, as defined by the ISO
standard. Wherever applicable, we also provide insights into the software
engineering details that tools leverage, in order to address DQ challenges.