ADDRESSING SELECTION BIAS AND MEASUREMENT ERROR IN COVID-19 CASE COUNT DATA USING AUXILIARY INFORMATION.

IF 1.4 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics Pub Date : 2023-12-01 Epub Date: 2023-10-30 DOI:10.1214/23-aoas1744

Walter Dempsey

{"title":"ADDRESSING SELECTION BIAS AND MEASUREMENT ERROR IN COVID-19 CASE COUNT DATA USING AUXILIARY INFORMATION.","authors":"Walter Dempsey","doi":"10.1214/23-aoas1744","DOIUrl":null,"url":null,"abstract":"Coronavirus case-count data has influenced government policies and drives most epidemiological forecasts. Limited testing is cited as the key driver behind minimal information on the COVID-19 pandemic. While expanded testing is laudable, measurement error and selection bias are the two greatest problems limiting our understanding of the COVID-19 pandemic; neither can be fully addressed by increased testing capacity. In this paper, we demonstrate their impact on estimation of point prevalence and the effective reproduction number. We show that estimates based on the millions of molecular tests in the US has the same mean square error as a small simple random sample. To address this, a procedure is presented that combines case-count data and random samples over time to estimate selection propensities based on key covariate information. We then combine these selection propensities with epidemiological forecast models to construct a doubly robust estimation method that accounts for both measurement-error and selection bias. This method is then applied to estimate Indiana's active infection prevalence using case-count, hospitalization, and death data with demographic information, a statewide random molecular sample collected from April 25-29th, and Delphi's COVID-19 Trends and Impact Survey. We end with a series of recommendations based on the proposed methodology.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 4","pages":"2903-2923"},"PeriodicalIF":1.4000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11210953/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/23-aoas1744","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Coronavirus case-count data has influenced government policies and drives most epidemiological forecasts. Limited testing is cited as the key driver behind minimal information on the COVID-19 pandemic. While expanded testing is laudable, measurement error and selection bias are the two greatest problems limiting our understanding of the COVID-19 pandemic; neither can be fully addressed by increased testing capacity. In this paper, we demonstrate their impact on estimation of point prevalence and the effective reproduction number. We show that estimates based on the millions of molecular tests in the US has the same mean square error as a small simple random sample. To address this, a procedure is presented that combines case-count data and random samples over time to estimate selection propensities based on key covariate information. We then combine these selection propensities with epidemiological forecast models to construct a doubly robust estimation method that accounts for both measurement-error and selection bias. This method is then applied to estimate Indiana's active infection prevalence using case-count, hospitalization, and death data with demographic information, a statewide random molecular sample collected from April 25-29th, and Delphi's COVID-19 Trends and Impact Survey. We end with a series of recommendations based on the proposed methodology.

查看原文本刊更多论文

利用辅助信息解决 covid-19 病例计数数据中的选择偏差和测量误差。

冠状病毒病例计数数据影响着政府政策，并推动着大多数流行病学预测。有限的检测被认为是 COVID-19 大流行信息极少的主要原因。尽管扩大检测范围值得称赞，但测量误差和选择偏差是限制我们了解 COVID-19 大流行的两个最大问题；提高检测能力无法完全解决这两个问题。在本文中，我们展示了这两个问题对点流行率和有效繁殖数估算的影响。我们表明，根据美国数百万次分子检测得出的估计值与少量简单随机抽样得出的估计值具有相同的均方误差。为了解决这个问题，我们介绍了一种程序，该程序结合了病例计数数据和随时间变化的随机样本，根据关键协变量信息估算出选择倾向。然后，我们将这些选择倾向与流行病学预测模型相结合，构建出一种双重稳健的估算方法，既能考虑测量误差，又能考虑选择偏差。然后，利用病例计数、住院和死亡数据以及人口统计信息、4 月 25-29 日收集的全州随机分子样本和德尔菲 COVID-19 趋势和影响调查，将该方法用于估算印第安纳州的活动性感染流行率。最后，我们将根据建议的方法提出一系列建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annals of Applied Statistics 社会科学-统计学与概率论

CiteScore

3.10

自引率

5.60%

发文量

131

审稿时长

6-12 weeks

期刊介绍： Statistical research spans an enormous range from direct subject-matter collaborations to pure mathematical theory. The Annals of Applied Statistics, the newest journal from the IMS, is aimed at papers in the applied half of this range. Published quarterly in both print and electronic form, our goal is to provide a timely and unified forum for all areas of applied statistics.