Definition of a Practical Taxonomy for Referencing Data Quality Problems in Health Care Databases.

IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Paul Quindroit, Mathilde Fruchart, Samuel Degoul, Renaud Périchon, Julien Soula, Romaric Marcilly, Antoine Lamer
{"title":"Definition of a Practical Taxonomy for Referencing Data Quality Problems in Health Care Databases.","authors":"Paul Quindroit,&nbsp;Mathilde Fruchart,&nbsp;Samuel Degoul,&nbsp;Renaud Périchon,&nbsp;Julien Soula,&nbsp;Romaric Marcilly,&nbsp;Antoine Lamer","doi":"10.1055/a-1976-2371","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Health care information systems can generate and/or record huge volumes of data, some of which may be reused for research, clinical trials, or teaching. However, these databases can be affected by data quality problems; hence, an important step in the data reuse process consists in detecting and rectifying these issues. With a view to facilitating the assessment of data quality, we developed a taxonomy of data quality problems in operational databases.</p><p><strong>Material: </strong>We searched the literature for publications that mentioned \"data quality problems,\" \"data quality taxonomy,\" \"data quality assessment,\" or \"dirty data.\" The publications were then reviewed, compared, summarized, and structured using a bottom-up approach, to provide an operational taxonomy of data quality problems. The latter were illustrated with fictional examples (though based on reality) from clinical databases.</p><p><strong>Results: </strong>Twelve publications were selected, and 286 instances of data quality problems were identified and were classified according to six distinct levels of granularity. We used the classification defined by Oliveira et al to structure our taxonomy. The extracted items were grouped into 53 data quality problems.</p><p><strong>Discussion: </strong>This taxonomy facilitated the systematic assessment of data quality in databases by presenting the data's quality according to their granularity. The definition of this taxonomy is the first step in the data cleaning process. The subsequent steps include the definition of associated quality assessment methods and data cleaning methods.</p><p><strong>Conclusion: </strong>Our new taxonomy enabled the classification and illustration of 53 data quality problems found in hospital databases.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 1-02","pages":"19-30"},"PeriodicalIF":1.3000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-1976-2371","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

Abstract

Introduction: Health care information systems can generate and/or record huge volumes of data, some of which may be reused for research, clinical trials, or teaching. However, these databases can be affected by data quality problems; hence, an important step in the data reuse process consists in detecting and rectifying these issues. With a view to facilitating the assessment of data quality, we developed a taxonomy of data quality problems in operational databases.

Material: We searched the literature for publications that mentioned "data quality problems," "data quality taxonomy," "data quality assessment," or "dirty data." The publications were then reviewed, compared, summarized, and structured using a bottom-up approach, to provide an operational taxonomy of data quality problems. The latter were illustrated with fictional examples (though based on reality) from clinical databases.

Results: Twelve publications were selected, and 286 instances of data quality problems were identified and were classified according to six distinct levels of granularity. We used the classification defined by Oliveira et al to structure our taxonomy. The extracted items were grouped into 53 data quality problems.

Discussion: This taxonomy facilitated the systematic assessment of data quality in databases by presenting the data's quality according to their granularity. The definition of this taxonomy is the first step in the data cleaning process. The subsequent steps include the definition of associated quality assessment methods and data cleaning methods.

Conclusion: Our new taxonomy enabled the classification and illustration of 53 data quality problems found in hospital databases.

参考卫生保健数据库中数据质量问题的实用分类法的定义。
简介:卫生保健信息系统可以生成和/或记录大量数据,其中一些数据可以用于研究、临床试验或教学。然而,这些数据库可能受到数据质量问题的影响;因此,数据重用过程中的一个重要步骤是检测和纠正这些问题。为了便于评估数据质量,我们制定了一套运行数据库中数据质量问题的分类。材料:我们搜索了提到“数据质量问题”、“数据质量分类”、“数据质量评估”或“脏数据”的出版物的文献。然后使用自底向上的方法对出版物进行审查、比较、总结和结构化,以提供数据质量问题的操作分类法。后者是用临床数据库中的虚构例子(尽管基于现实)来说明的。结果:选取了12篇出版物,确定了286个数据质量问题实例,并根据六个不同的粒度级别进行了分类。我们使用Oliveira等人定义的分类来构建我们的分类法。提取的项目被分为53个数据质量问题。讨论:这种分类法通过根据数据的粒度表示数据的质量,促进了对数据库中数据质量的系统评估。这个分类法的定义是数据清理过程中的第一步。后续步骤包括定义相关的质量评估方法和数据清理方法。结论:我们的新分类法能够对医院数据库中发现的53个数据质量问题进行分类和说明。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Methods of Information in Medicine
Methods of Information in Medicine 医学-计算机:信息系统
CiteScore
3.70
自引率
11.80%
发文量
33
审稿时长
6-12 weeks
期刊介绍: Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信