Consistency as a Data Quality Measure for German Corona Consensus Items Mapped from National Pandemic Cohort Network Data Collections.

IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Khalid O Yusuf, Olga Miljukov, Anne Schoneberg, Sabine Hanß, Martin Wiesenfeldt, Melanie Stecher, Lazar Mitrov, Sina Marie Hopff, Sarah Steinbrecher, Florian Kurth, Thomas Bahmer, Stefan Schreiber, Daniel Pape, Anna-Lena Hofmann, Mirjam Kohls, Stefan Störk, Hans Christian Stubbe, Johannes J Tebbe, Johannes C Hellmuth, Johanna Erber, Lilian Krist, Siegbert Rieg, Lisa Pilgram, Jörg J Vehreschild, Jens-Peter Reese, Dagmar Krefting
{"title":"Consistency as a Data Quality Measure for German Corona Consensus Items Mapped from National Pandemic Cohort Network Data Collections.","authors":"Khalid O Yusuf,&nbsp;Olga Miljukov,&nbsp;Anne Schoneberg,&nbsp;Sabine Hanß,&nbsp;Martin Wiesenfeldt,&nbsp;Melanie Stecher,&nbsp;Lazar Mitrov,&nbsp;Sina Marie Hopff,&nbsp;Sarah Steinbrecher,&nbsp;Florian Kurth,&nbsp;Thomas Bahmer,&nbsp;Stefan Schreiber,&nbsp;Daniel Pape,&nbsp;Anna-Lena Hofmann,&nbsp;Mirjam Kohls,&nbsp;Stefan Störk,&nbsp;Hans Christian Stubbe,&nbsp;Johannes J Tebbe,&nbsp;Johannes C Hellmuth,&nbsp;Johanna Erber,&nbsp;Lilian Krist,&nbsp;Siegbert Rieg,&nbsp;Lisa Pilgram,&nbsp;Jörg J Vehreschild,&nbsp;Jens-Peter Reese,&nbsp;Dagmar Krefting","doi":"10.1055/a-2006-1086","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>As a national effort to better understand the current pandemic, three cohorts collect sociodemographic and clinical data from coronavirus disease 2019 (COVID-19) patients from different target populations within the German National Pandemic Cohort Network (NAPKON). Furthermore, the German Corona Consensus Dataset (GECCO) was introduced as a harmonized basic information model for COVID-19 patients in clinical routine. To compare the cohort data with other GECCO-based studies, data items are mapped to GECCO. As mapping from one information model to another is complex, an additional consistency evaluation of the mapped items is recommended to detect possible mapping issues or source data inconsistencies.</p><p><strong>Objectives: </strong>The goal of this work is to assure high consistency of research data mapped to the GECCO data model. In particular, it aims at identifying contradictions within interdependent GECCO data items of the German national COVID-19 cohorts to allow investigation of possible reasons for identified contradictions. We furthermore aim at enabling other researchers to easily perform data quality evaluation on GECCO-based datasets and adapt to similar data models.</p><p><strong>Methods: </strong>All suitable data items from each of the three NAPKON cohorts are mapped to the GECCO items. A consistency assessment tool (dqGecco) is implemented, following the design of an existing quality assessment framework, retaining their<i>-</i>defined consistency taxonomies, including logical and empirical contradictions. Results of the assessment are verified independently on the primary data source.</p><p><strong>Results: </strong>Our consistency assessment tool helped in correcting the mapping procedure and reveals remaining contradictory value combinations within COVID-19 symptoms, vital signs, and COVID-19 severity. Consistency rates differ between the different indicators and cohorts ranging from 95.84% up to 100%.</p><p><strong>Conclusion: </strong>An efficient and portable tool capable of discovering inconsistencies in the COVID-19 domain has been developed and applied to three different cohorts. As the GECCO dataset is employed in different platforms and studies, the tool can be directly applied there or adapted to similar information models.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e47-e56"},"PeriodicalIF":1.3000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/4d/05/10-1055-a-2006-1086.PMC10306447.pdf","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2006-1086","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2

Abstract

Background: As a national effort to better understand the current pandemic, three cohorts collect sociodemographic and clinical data from coronavirus disease 2019 (COVID-19) patients from different target populations within the German National Pandemic Cohort Network (NAPKON). Furthermore, the German Corona Consensus Dataset (GECCO) was introduced as a harmonized basic information model for COVID-19 patients in clinical routine. To compare the cohort data with other GECCO-based studies, data items are mapped to GECCO. As mapping from one information model to another is complex, an additional consistency evaluation of the mapped items is recommended to detect possible mapping issues or source data inconsistencies.

Objectives: The goal of this work is to assure high consistency of research data mapped to the GECCO data model. In particular, it aims at identifying contradictions within interdependent GECCO data items of the German national COVID-19 cohorts to allow investigation of possible reasons for identified contradictions. We furthermore aim at enabling other researchers to easily perform data quality evaluation on GECCO-based datasets and adapt to similar data models.

Methods: All suitable data items from each of the three NAPKON cohorts are mapped to the GECCO items. A consistency assessment tool (dqGecco) is implemented, following the design of an existing quality assessment framework, retaining their-defined consistency taxonomies, including logical and empirical contradictions. Results of the assessment are verified independently on the primary data source.

Results: Our consistency assessment tool helped in correcting the mapping procedure and reveals remaining contradictory value combinations within COVID-19 symptoms, vital signs, and COVID-19 severity. Consistency rates differ between the different indicators and cohorts ranging from 95.84% up to 100%.

Conclusion: An efficient and portable tool capable of discovering inconsistencies in the COVID-19 domain has been developed and applied to three different cohorts. As the GECCO dataset is employed in different platforms and studies, the tool can be directly applied there or adapted to similar information models.

Abstract Image

Abstract Image

一致性作为来自国家大流行队列网络数据收集的德国冠状病毒共识项目的数据质量度量。
背景:为了更好地了解当前的大流行,德国国家大流行队列网络(NAPKON)内的三个队列收集了来自不同目标人群的2019冠状病毒病(COVID-19)患者的社会人口统计学和临床数据。此外,引入德国冠状病毒共识数据集(GECCO)作为临床常规中COVID-19患者的统一基本信息模型。为了将队列数据与其他基于GECCO的研究进行比较,数据项被映射到GECCO。由于从一个信息模型到另一个信息模型的映射很复杂,因此建议对映射项进行额外的一致性评估,以检测可能的映射问题或源数据不一致。目的:本工作的目的是确保研究数据映射到GECCO数据模型的高度一致性。特别是,它旨在识别德国国家COVID-19队列中相互依赖的GECCO数据项目中的矛盾,以便调查确定矛盾的可能原因。此外,我们的目标是使其他研究人员能够轻松地对基于gecco的数据集进行数据质量评估,并适应类似的数据模型。方法:将三个NAPKON队列中所有合适的数据项映射到GECCO项目。遵循现有质量评估框架的设计,实现了一致性评估工具(dqGecco),保留了它们定义的一致性分类法,包括逻辑和经验矛盾。评估结果在主要数据源上得到独立验证。结果:我们的一致性评估工具帮助纠正了制图程序,并揭示了COVID-19症状、生命体征和COVID-19严重程度之间剩余的矛盾值组合。不同指标和队列之间的一致性率从95.84%到100%不等。结论:开发了一种能够发现COVID-19领域不一致性的高效便携式工具,并将其应用于三个不同的队列。由于GECCO数据集用于不同的平台和研究,该工具可以直接应用于不同的平台和研究,也可以适应类似的信息模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Methods of Information in Medicine
Methods of Information in Medicine 医学-计算机:信息系统
CiteScore
3.70
自引率
11.80%
发文量
33
审稿时长
6-12 weeks
期刊介绍: Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信