Paul Quindroit, Mathilde Fruchart, Samuel Degoul, Renaud Périchon, Julien Soula, Romaric Marcilly, Antoine Lamer
{"title":"参考卫生保健数据库中数据质量问题的实用分类法的定义。","authors":"Paul Quindroit, Mathilde Fruchart, Samuel Degoul, Renaud Périchon, Julien Soula, Romaric Marcilly, Antoine Lamer","doi":"10.1055/a-1976-2371","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Health care information systems can generate and/or record huge volumes of data, some of which may be reused for research, clinical trials, or teaching. However, these databases can be affected by data quality problems; hence, an important step in the data reuse process consists in detecting and rectifying these issues. With a view to facilitating the assessment of data quality, we developed a taxonomy of data quality problems in operational databases.</p><p><strong>Material: </strong>We searched the literature for publications that mentioned \"data quality problems,\" \"data quality taxonomy,\" \"data quality assessment,\" or \"dirty data.\" The publications were then reviewed, compared, summarized, and structured using a bottom-up approach, to provide an operational taxonomy of data quality problems. The latter were illustrated with fictional examples (though based on reality) from clinical databases.</p><p><strong>Results: </strong>Twelve publications were selected, and 286 instances of data quality problems were identified and were classified according to six distinct levels of granularity. We used the classification defined by Oliveira et al to structure our taxonomy. The extracted items were grouped into 53 data quality problems.</p><p><strong>Discussion: </strong>This taxonomy facilitated the systematic assessment of data quality in databases by presenting the data's quality according to their granularity. The definition of this taxonomy is the first step in the data cleaning process. The subsequent steps include the definition of associated quality assessment methods and data cleaning methods.</p><p><strong>Conclusion: </strong>Our new taxonomy enabled the classification and illustration of 53 data quality problems found in hospital databases.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 1-02","pages":"19-30"},"PeriodicalIF":1.3000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Definition of a Practical Taxonomy for Referencing Data Quality Problems in Health Care Databases.\",\"authors\":\"Paul Quindroit, Mathilde Fruchart, Samuel Degoul, Renaud Périchon, Julien Soula, Romaric Marcilly, Antoine Lamer\",\"doi\":\"10.1055/a-1976-2371\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Health care information systems can generate and/or record huge volumes of data, some of which may be reused for research, clinical trials, or teaching. However, these databases can be affected by data quality problems; hence, an important step in the data reuse process consists in detecting and rectifying these issues. With a view to facilitating the assessment of data quality, we developed a taxonomy of data quality problems in operational databases.</p><p><strong>Material: </strong>We searched the literature for publications that mentioned \\\"data quality problems,\\\" \\\"data quality taxonomy,\\\" \\\"data quality assessment,\\\" or \\\"dirty data.\\\" The publications were then reviewed, compared, summarized, and structured using a bottom-up approach, to provide an operational taxonomy of data quality problems. The latter were illustrated with fictional examples (though based on reality) from clinical databases.</p><p><strong>Results: </strong>Twelve publications were selected, and 286 instances of data quality problems were identified and were classified according to six distinct levels of granularity. We used the classification defined by Oliveira et al to structure our taxonomy. The extracted items were grouped into 53 data quality problems.</p><p><strong>Discussion: </strong>This taxonomy facilitated the systematic assessment of data quality in databases by presenting the data's quality according to their granularity. The definition of this taxonomy is the first step in the data cleaning process. The subsequent steps include the definition of associated quality assessment methods and data cleaning methods.</p><p><strong>Conclusion: </strong>Our new taxonomy enabled the classification and illustration of 53 data quality problems found in hospital databases.</p>\",\"PeriodicalId\":49822,\"journal\":{\"name\":\"Methods of Information in Medicine\",\"volume\":\"62 1-02\",\"pages\":\"19-30\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods of Information in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1055/a-1976-2371\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-1976-2371","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Definition of a Practical Taxonomy for Referencing Data Quality Problems in Health Care Databases.
Introduction: Health care information systems can generate and/or record huge volumes of data, some of which may be reused for research, clinical trials, or teaching. However, these databases can be affected by data quality problems; hence, an important step in the data reuse process consists in detecting and rectifying these issues. With a view to facilitating the assessment of data quality, we developed a taxonomy of data quality problems in operational databases.
Material: We searched the literature for publications that mentioned "data quality problems," "data quality taxonomy," "data quality assessment," or "dirty data." The publications were then reviewed, compared, summarized, and structured using a bottom-up approach, to provide an operational taxonomy of data quality problems. The latter were illustrated with fictional examples (though based on reality) from clinical databases.
Results: Twelve publications were selected, and 286 instances of data quality problems were identified and were classified according to six distinct levels of granularity. We used the classification defined by Oliveira et al to structure our taxonomy. The extracted items were grouped into 53 data quality problems.
Discussion: This taxonomy facilitated the systematic assessment of data quality in databases by presenting the data's quality according to their granularity. The definition of this taxonomy is the first step in the data cleaning process. The subsequent steps include the definition of associated quality assessment methods and data cleaning methods.
Conclusion: Our new taxonomy enabled the classification and illustration of 53 data quality problems found in hospital databases.
期刊介绍:
Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.