Daniel S Farrar, Lisa G Pell, Yasin Muhammad, Sher Hafiz Khan, Lauren Erdman, Diego G Bassani, Zachary Tanner, Imran Ahmed Chauhadry, Muhammad Karim, Falak Madhani, Shariq Paracha, Masood Ali Khan, Sajid Soofi, Monica Taljaard, Rachel F Spitzer, Sarah M Abu Fadaleh, Zulfiqar A Bhutta, Shaun K Morris
{"title":"在巴基斯坦吉尔吉特-巴尔蒂斯坦对100万户家庭的横断面调查和基于症状的机器学习模型中估计未确诊的COVID-19病例","authors":"Daniel S Farrar, Lisa G Pell, Yasin Muhammad, Sher Hafiz Khan, Lauren Erdman, Diego G Bassani, Zachary Tanner, Imran Ahmed Chauhadry, Muhammad Karim, Falak Madhani, Shariq Paracha, Masood Ali Khan, Sajid Soofi, Monica Taljaard, Rachel F Spitzer, Sarah M Abu Fadaleh, Zulfiqar A Bhutta, Shaun K Morris","doi":"10.1136/bmjph-2024-001255","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Robust estimates of COVID-19 prevalence in settings with limited capacity for SARS-CoV-2 molecular and serologic testing are scarce. We aimed to describe the epidemiology of confirmed and probable COVID-19 in Gilgit-Baltistan, and to develop a symptom-based predictive model to identify infected but undiagnosed individuals with COVID-19.</p><p><strong>Methods: </strong>We conducted a cross-sectional survey in 10 257 randomly selected households in Gilgit-Baltistan from June to August 2021. Data regarding SARS-CoV-2 testing, healthcare worker (HCW) diagnoses, symptoms and outcomes since March 2020 were self-reported by households. 'Confirmed/probable' infection was defined as a positive test, HCW COVID-19 diagnosis or HCW pneumonia diagnosis with COVID-19-positive contact. Robust Poisson regression was conducted to assess differences in symptoms, outcomes and SARS-CoV-2 testing rates. We developed a symptom-based machine learning model to differentiate confirmed/probable infections from those with negative tests. We applied this model to untested respondents to estimate the total prevalence of SARS-CoV-2 infection.</p><p><strong>Results: </strong>Data were collected for 77 924 people. Overall, 314 (0.5%) had confirmed/probable infections, 3263 (4.4%) had negative tests and 74 347 (95.1%) were untested. Children were tested less often than adults (adjusted prevalence ratio (aPR) 0.08, 95% CI 0.06 to 0.12 for ages 1-4 years vs 30-39 years), while males were tested more often than females (aPR 1.51, 95% CI 1.40 to 1.63). In the predictive model, area under the receiver operating characteristic curve was 0.92 (95% CI 0.90 to 0.93). We estimate there were 8-17 total SARS-CoV-2 infections for each positive test (8-17:1). The ratio of estimated to confirmed cases was higher for ages 1-4 years (211-480:1), 5-9 years (80-185:1) and for females (13-25:1).</p><p><strong>Conclusions: </strong>From March 2020 to August 2021, the majority of SARS-CoV-2 infections in Gilgit-Baltistan went unconfirmed, particularly among women and children. Predictive models which incorporate self-reported symptoms may improve understanding of the burden of disease in settings lacking diagnostic capacity.</p>","PeriodicalId":101362,"journal":{"name":"BMJ public health","volume":"3 1","pages":"e001255"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12039044/pdf/","citationCount":"0","resultStr":"{\"title\":\"Estimation of unconfirmed COVID-19 cases from a cross-sectional survey of >10 000 households and a symptom-based machine learning model in Gilgit-Baltistan, Pakistan.\",\"authors\":\"Daniel S Farrar, Lisa G Pell, Yasin Muhammad, Sher Hafiz Khan, Lauren Erdman, Diego G Bassani, Zachary Tanner, Imran Ahmed Chauhadry, Muhammad Karim, Falak Madhani, Shariq Paracha, Masood Ali Khan, Sajid Soofi, Monica Taljaard, Rachel F Spitzer, Sarah M Abu Fadaleh, Zulfiqar A Bhutta, Shaun K Morris\",\"doi\":\"10.1136/bmjph-2024-001255\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Robust estimates of COVID-19 prevalence in settings with limited capacity for SARS-CoV-2 molecular and serologic testing are scarce. We aimed to describe the epidemiology of confirmed and probable COVID-19 in Gilgit-Baltistan, and to develop a symptom-based predictive model to identify infected but undiagnosed individuals with COVID-19.</p><p><strong>Methods: </strong>We conducted a cross-sectional survey in 10 257 randomly selected households in Gilgit-Baltistan from June to August 2021. Data regarding SARS-CoV-2 testing, healthcare worker (HCW) diagnoses, symptoms and outcomes since March 2020 were self-reported by households. 'Confirmed/probable' infection was defined as a positive test, HCW COVID-19 diagnosis or HCW pneumonia diagnosis with COVID-19-positive contact. Robust Poisson regression was conducted to assess differences in symptoms, outcomes and SARS-CoV-2 testing rates. We developed a symptom-based machine learning model to differentiate confirmed/probable infections from those with negative tests. We applied this model to untested respondents to estimate the total prevalence of SARS-CoV-2 infection.</p><p><strong>Results: </strong>Data were collected for 77 924 people. Overall, 314 (0.5%) had confirmed/probable infections, 3263 (4.4%) had negative tests and 74 347 (95.1%) were untested. Children were tested less often than adults (adjusted prevalence ratio (aPR) 0.08, 95% CI 0.06 to 0.12 for ages 1-4 years vs 30-39 years), while males were tested more often than females (aPR 1.51, 95% CI 1.40 to 1.63). In the predictive model, area under the receiver operating characteristic curve was 0.92 (95% CI 0.90 to 0.93). We estimate there were 8-17 total SARS-CoV-2 infections for each positive test (8-17:1). The ratio of estimated to confirmed cases was higher for ages 1-4 years (211-480:1), 5-9 years (80-185:1) and for females (13-25:1).</p><p><strong>Conclusions: </strong>From March 2020 to August 2021, the majority of SARS-CoV-2 infections in Gilgit-Baltistan went unconfirmed, particularly among women and children. Predictive models which incorporate self-reported symptoms may improve understanding of the burden of disease in settings lacking diagnostic capacity.</p>\",\"PeriodicalId\":101362,\"journal\":{\"name\":\"BMJ public health\",\"volume\":\"3 1\",\"pages\":\"e001255\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12039044/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ public health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjph-2024-001255\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ public health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjph-2024-001255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在SARS-CoV-2分子和血清学检测能力有限的环境中,缺乏对COVID-19流行率的可靠估计。我们的目的是描述吉尔吉特-巴尔蒂斯坦确诊和疑似COVID-19的流行病学,并建立基于症状的预测模型,以识别感染但未确诊的COVID-19个体。方法:我们于2021年6月至8月在吉尔吉特-巴尔蒂斯坦随机抽取10257户家庭进行横断面调查。自2020年3月以来,有关SARS-CoV-2检测、医护人员诊断、症状和结果的数据由家庭自我报告。“确诊/可能”感染定义为检测阳性、HCW COVID-19诊断或HCW肺炎诊断与COVID-19阳性接触。采用稳健泊松回归来评估症状、结局和SARS-CoV-2检测率的差异。我们开发了一种基于症状的机器学习模型,以区分确诊/可能感染与阴性检测的感染。我们将该模型应用于未经测试的受访者,以估计SARS-CoV-2感染的总流行率。结果:共收集资料77 924人。总体而言,314人(0.5%)确诊/可能感染,3263人(4.4%)检测阴性,74 347人(95.1%)未检测。儿童的检测频率低于成人(1-4岁与30-39岁的调整患病率比(aPR) 0.08, 95% CI 0.06至0.12),而男性的检测频率高于女性(aPR 1.51, 95% CI 1.40至1.63)。在预测模型中,受试者工作特征曲线下面积为0.92 (95% CI 0.90 ~ 0.93)。我们估计每次检测阳性的SARS-CoV-2感染总数为8-17例(8-17:1)。估计病例与确诊病例的比例在1-4岁(211-480:1)、5-9岁(80-185:1)和女性(13-25:1)中较高。结论:从2020年3月到2021年8月,吉尔吉特-巴尔蒂斯坦的大多数SARS-CoV-2感染未得到证实,特别是在妇女和儿童中。在缺乏诊断能力的环境中,纳入自我报告症状的预测模型可提高对疾病负担的认识。
Estimation of unconfirmed COVID-19 cases from a cross-sectional survey of >10 000 households and a symptom-based machine learning model in Gilgit-Baltistan, Pakistan.
Introduction: Robust estimates of COVID-19 prevalence in settings with limited capacity for SARS-CoV-2 molecular and serologic testing are scarce. We aimed to describe the epidemiology of confirmed and probable COVID-19 in Gilgit-Baltistan, and to develop a symptom-based predictive model to identify infected but undiagnosed individuals with COVID-19.
Methods: We conducted a cross-sectional survey in 10 257 randomly selected households in Gilgit-Baltistan from June to August 2021. Data regarding SARS-CoV-2 testing, healthcare worker (HCW) diagnoses, symptoms and outcomes since March 2020 were self-reported by households. 'Confirmed/probable' infection was defined as a positive test, HCW COVID-19 diagnosis or HCW pneumonia diagnosis with COVID-19-positive contact. Robust Poisson regression was conducted to assess differences in symptoms, outcomes and SARS-CoV-2 testing rates. We developed a symptom-based machine learning model to differentiate confirmed/probable infections from those with negative tests. We applied this model to untested respondents to estimate the total prevalence of SARS-CoV-2 infection.
Results: Data were collected for 77 924 people. Overall, 314 (0.5%) had confirmed/probable infections, 3263 (4.4%) had negative tests and 74 347 (95.1%) were untested. Children were tested less often than adults (adjusted prevalence ratio (aPR) 0.08, 95% CI 0.06 to 0.12 for ages 1-4 years vs 30-39 years), while males were tested more often than females (aPR 1.51, 95% CI 1.40 to 1.63). In the predictive model, area under the receiver operating characteristic curve was 0.92 (95% CI 0.90 to 0.93). We estimate there were 8-17 total SARS-CoV-2 infections for each positive test (8-17:1). The ratio of estimated to confirmed cases was higher for ages 1-4 years (211-480:1), 5-9 years (80-185:1) and for females (13-25:1).
Conclusions: From March 2020 to August 2021, the majority of SARS-CoV-2 infections in Gilgit-Baltistan went unconfirmed, particularly among women and children. Predictive models which incorporate self-reported symptoms may improve understanding of the burden of disease in settings lacking diagnostic capacity.