Informatics assessment of COVID-19 data collection: an analysis of UK Biobank questionnaire data.

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS
Craig S Mayer
{"title":"Informatics assessment of COVID-19 data collection: an analysis of UK Biobank questionnaire data.","authors":"Craig S Mayer","doi":"10.1186/s12911-024-02743-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>There have been many efforts to expand existing data collection initiatives to include COVID-19 related data. One program that expanded is UK Biobank, a large-scale research and biomedical data collection resource that added several COVID-19 related data fields including questionnaires (exposures and symptoms), viral testing, and serological data. This study aimed to analyze this COVID-19 data to understand how COVID-19 data was collected and how it can be used to attribute COVID-19 and analyze differences in cohorts and time periods.</p><p><strong>Methods: </strong>A cohort of COVID-19 infected individuals was defined from the UK Biobank population using viral testing, diagnosis, and self-reported data. Changes over time, from March 2020 to October 2021, in total case counts and changes in case counts by identification source (diagnosis from EHR, measurement from viral testing and self-reported from questionnaire) were also analyzed. For the questionnaires, an analysis of the structure and dynamics of the questionnaires was done which included the amount and type of questions asked, how often and how many individuals answered the questions and what responses were given. In addition, the amount of individuals who provided responses regarding different time segments covered by the questionnaire was calculated along with how often responses changed. The analysis included changes in population level responses over time. The analyses were repeated for COVID and non-COVID individuals and compared responses.</p><p><strong>Results: </strong>There were 62 042 distinct participants who had COVID-19, with 49 120 identified through diagnosis, 30 553 identified through viral testing and 934 identified through self-reporting, with many identified in multiple methods. This included vast changes in overall cases and distribution of case data source over time. 6 899 of 9 952 participants completing the exposure questionnaire responded regarding every time period covered by the questionnaire including large changes in response over time. The most common change came for employment situation, which was changed by 74.78% of individuals from the first to last time of asking. On a population level, there were changes as face mask usage increased each successive time period. There were decreases in nearly every COVID-19 symptom from the first to the second questionnaire. When comparing COVID to non-COVID participants, COVID participants were more commonly keyworkers (COVID: 33.76%, non-COVID: 15.00%) and more often lived with young people attending school (61.70%, 45.32%).</p><p><strong>Conclusion: </strong>To develop a robust cohort of COVID-19 participants from the UK Biobank population, multiple types of data were needed. The differences based on time and exposures show the important of comprehensive data capture and the utility of COVID-19 related questionnaire data.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"321"},"PeriodicalIF":3.3000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529153/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02743-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: There have been many efforts to expand existing data collection initiatives to include COVID-19 related data. One program that expanded is UK Biobank, a large-scale research and biomedical data collection resource that added several COVID-19 related data fields including questionnaires (exposures and symptoms), viral testing, and serological data. This study aimed to analyze this COVID-19 data to understand how COVID-19 data was collected and how it can be used to attribute COVID-19 and analyze differences in cohorts and time periods.

Methods: A cohort of COVID-19 infected individuals was defined from the UK Biobank population using viral testing, diagnosis, and self-reported data. Changes over time, from March 2020 to October 2021, in total case counts and changes in case counts by identification source (diagnosis from EHR, measurement from viral testing and self-reported from questionnaire) were also analyzed. For the questionnaires, an analysis of the structure and dynamics of the questionnaires was done which included the amount and type of questions asked, how often and how many individuals answered the questions and what responses were given. In addition, the amount of individuals who provided responses regarding different time segments covered by the questionnaire was calculated along with how often responses changed. The analysis included changes in population level responses over time. The analyses were repeated for COVID and non-COVID individuals and compared responses.

Results: There were 62 042 distinct participants who had COVID-19, with 49 120 identified through diagnosis, 30 553 identified through viral testing and 934 identified through self-reporting, with many identified in multiple methods. This included vast changes in overall cases and distribution of case data source over time. 6 899 of 9 952 participants completing the exposure questionnaire responded regarding every time period covered by the questionnaire including large changes in response over time. The most common change came for employment situation, which was changed by 74.78% of individuals from the first to last time of asking. On a population level, there were changes as face mask usage increased each successive time period. There were decreases in nearly every COVID-19 symptom from the first to the second questionnaire. When comparing COVID to non-COVID participants, COVID participants were more commonly keyworkers (COVID: 33.76%, non-COVID: 15.00%) and more often lived with young people attending school (61.70%, 45.32%).

Conclusion: To develop a robust cohort of COVID-19 participants from the UK Biobank population, multiple types of data were needed. The differences based on time and exposures show the important of comprehensive data capture and the utility of COVID-19 related questionnaire data.

COVID-19 数据收集的信息学评估:英国生物库问卷数据分析。
背景:人们一直在努力扩展现有的数据收集计划,以纳入 COVID-19 相关数据。英国生物库(UK Biobank)是一个大规模的研究和生物医学数据收集资源,它增加了几个与 COVID-19 相关的数据字段,包括问卷调查(暴露和症状)、病毒检测和血清学数据。本研究旨在分析 COVID-19 数据,以了解 COVID-19 数据是如何收集的,以及如何将其用于归因 COVID-19 和分析不同队列和时间段的差异:方法:利用病毒检测、诊断和自我报告数据,从英国生物库人群中定义 COVID-19 感染者队列。此外,还分析了 2020 年 3 月至 2021 年 10 月期间总病例数的变化以及按识别来源(电子病历的诊断、病毒检测的测量和问卷调查的自我报告)划分的病例数变化。在问卷调查方面,我们对问卷调查的结构和动态进行了分析,其中包括所提问题的数量和类型、回答问题的频率和人数以及回答的内容。此外,还计算了就问卷所涵盖的不同时间段提供答复的人数,以及答复的变化频率。分析包括人口层面的回答随时间的变化。对 COVID 和非 COVID 个人重复进行了分析,并对回答进行了比较:结果:共有 62 042 名不同的参与者感染了 COVID-19,其中 49 120 人是通过诊断确定的,30 553 人是通过病毒检测确定的,934 人是通过自我报告确定的,许多人是通过多种方法确定的。这包括病例总数和病例数据源分布随时间推移而发生的巨大变化。在 9 952 名填写暴露情况调查问卷的参与者中,有 6 899 人对调查问卷所涵盖的每个时间段都作了回答,其中包括随时间推移在回答方面的巨大变化。最常见的变化是就业情况,74.78%的人从第一次询问到最后一次询问时就业情况发生了变化。从人口层面来看,随着口罩使用率的增加,每个时间段都有变化。从第一次问卷调查到第二次问卷调查,几乎所有 COVID-19 症状都有所减少。如果将 COVID 参与者与非 COVID 参与者进行比较,COVID 参与者更多是关键工作者(COVID:33.76%,非 COVID:15.00%),并且更多与上学的年轻人住在一起(61.70%,45.32%):要从英国生物库人口中建立一个强大的 COVID-19 参与者队列,需要多种类型的数据。基于时间和暴露的差异显示了全面数据采集的重要性以及 COVID-19 相关问卷数据的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信