Christina Mergenthaler, Jake D Mathewson, Stephanie Lako, Andreas Werle van der Merwe, Matthys Potgieter, Vincent Meurrens, Abdullah Latif, Hasan Tahir, Tanveer Ahmed, Zia Samad, Frank Cobelens, Daniella Brals, Mirjam I Bakker, Ente Rood
{"title":"预测巴基斯坦肺结核病例发现效率高的社区以优化资源分配:比较负二项空间滞后模型与贝叶斯机器学习模型的性能","authors":"Christina Mergenthaler, Jake D Mathewson, Stephanie Lako, Andreas Werle van der Merwe, Matthys Potgieter, Vincent Meurrens, Abdullah Latif, Hasan Tahir, Tanveer Ahmed, Zia Samad, Frank Cobelens, Daniella Brals, Mirjam I Bakker, Ente Rood","doi":"10.1136/bmjph-2024-001424","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Despite progress in tuberculosis (TB) treatment coverage in past years, an estimated 183 000 people with TB may not have been diagnosed in Pakistan in 2022. Therefore, there is a need to develop models which help to steer active case finding (ACF) towards populations with a high probability of having undetected TB. The aim of this study was to cross-validate TB positivity rate predictions in ACF settings of an existing Bayesian machine learning (BML) with a simpler frequentist model.</p><p><strong>Methods: </strong>We conducted a retrospective analysis of cross-sectional data to identify predictors for detection of bacteriologically confirmed TB cases during ACF events in Pakistan. A predictive negative binomial regression (NBR) model was created, and the presence of spatial autocorrelation was examined to account for spatial dependencies in the outcome variable. The NBR and BML models were compared on their respective predictive precisions for the identification of TB hotspots, based on Root Mean Square Error values, k-fold cross-validation and tehsil-level (sub-district) prediction rankings.</p><p><strong>Results: </strong>407 (1.9%) bacteriologically confirmed cases among 21 227 visitors were detected in 414 ACF events between September 2020 and January 2022. In the final NBR, the spatial lag variable explained most variation in TB positivity rates across ACF events. NBR and BML predictions were similar at tehsil level. While the BML had a slightly lower root mean squared error (1.02 vs 1.03) the NBR had a slightly better fit based on the Akaike information criterion.</p><p><strong>Conclusions: </strong>Statistical models can be effective in predicting TB hotspots for ACF planning, and the relatively simpler NBR model was nearly as effective as a more complex BML model. The predictions of different modelling approaches were similar, suggesting that predictions are more driven by covariates rather than modelling framework. The agreement between model results increases confidence in the potential utility of models to spatially target ACF activities in high need, low access areas.</p>","PeriodicalId":101362,"journal":{"name":"BMJ public health","volume":"3 1","pages":"e001424"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107446/pdf/","citationCount":"0","resultStr":"{\"title\":\"Predicting communities with high tuberculosis case-finding efficiency to optimise resource allocation in Pakistan: comparing the performance of a negative binomial spatial lag model with a Bayesian machine-learning model.\",\"authors\":\"Christina Mergenthaler, Jake D Mathewson, Stephanie Lako, Andreas Werle van der Merwe, Matthys Potgieter, Vincent Meurrens, Abdullah Latif, Hasan Tahir, Tanveer Ahmed, Zia Samad, Frank Cobelens, Daniella Brals, Mirjam I Bakker, Ente Rood\",\"doi\":\"10.1136/bmjph-2024-001424\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Despite progress in tuberculosis (TB) treatment coverage in past years, an estimated 183 000 people with TB may not have been diagnosed in Pakistan in 2022. Therefore, there is a need to develop models which help to steer active case finding (ACF) towards populations with a high probability of having undetected TB. The aim of this study was to cross-validate TB positivity rate predictions in ACF settings of an existing Bayesian machine learning (BML) with a simpler frequentist model.</p><p><strong>Methods: </strong>We conducted a retrospective analysis of cross-sectional data to identify predictors for detection of bacteriologically confirmed TB cases during ACF events in Pakistan. A predictive negative binomial regression (NBR) model was created, and the presence of spatial autocorrelation was examined to account for spatial dependencies in the outcome variable. The NBR and BML models were compared on their respective predictive precisions for the identification of TB hotspots, based on Root Mean Square Error values, k-fold cross-validation and tehsil-level (sub-district) prediction rankings.</p><p><strong>Results: </strong>407 (1.9%) bacteriologically confirmed cases among 21 227 visitors were detected in 414 ACF events between September 2020 and January 2022. In the final NBR, the spatial lag variable explained most variation in TB positivity rates across ACF events. NBR and BML predictions were similar at tehsil level. While the BML had a slightly lower root mean squared error (1.02 vs 1.03) the NBR had a slightly better fit based on the Akaike information criterion.</p><p><strong>Conclusions: </strong>Statistical models can be effective in predicting TB hotspots for ACF planning, and the relatively simpler NBR model was nearly as effective as a more complex BML model. The predictions of different modelling approaches were similar, suggesting that predictions are more driven by covariates rather than modelling framework. The agreement between model results increases confidence in the potential utility of models to spatially target ACF activities in high need, low access areas.</p>\",\"PeriodicalId\":101362,\"journal\":{\"name\":\"BMJ public health\",\"volume\":\"3 1\",\"pages\":\"e001424\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107446/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ public health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjph-2024-001424\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ public health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjph-2024-001424","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
导言:尽管过去几年在结核病治疗覆盖率方面取得了进展,但到2022年,巴基斯坦估计仍有18.3万结核病患者未得到诊断。因此,有必要开发模型,帮助将主动病例发现(ACF)引导到极有可能患有未被发现的结核病的人群。本研究的目的是用更简单的频率模型交叉验证现有贝叶斯机器学习(BML)在ACF设置下的结核病阳性率预测。方法:我们对横断面数据进行了回顾性分析,以确定在巴基斯坦ACF事件期间细菌学证实的结核病病例检测的预测因素。建立了一个预测负二项回归(NBR)模型,并检查了空间自相关的存在,以解释结果变量的空间依赖性。基于均方根误差(Root Mean Square Error)值、k-fold交叉验证和地区级(街道)预测排序,比较NBR和BML模型对结核病热点识别的预测精度。结果:2020年9月至2022年1月期间发生的414起ACF事件中,在21 227名游客中检出407例(1.9%)细菌确诊病例。在最终的NBR中,空间滞后变量解释了ACF事件中结核病阳性率的大部分变化。NBR和BML的预测在tesil水平上相似。虽然BML具有稍低的均方根误差(1.02 vs 1.03),但基于Akaike信息标准的NBR具有稍好的拟合性。结论:统计模型可以有效预测ACF规划的结核病热点,相对简单的NBR模型与更复杂的BML模型的有效性几乎相同。不同建模方法的预测是相似的,这表明预测更多地是由协变量而不是建模框架驱动的。模型结果之间的一致性增加了对模型在高需求、低访问区域的空间目标ACF活动的潜在效用的信心。
Predicting communities with high tuberculosis case-finding efficiency to optimise resource allocation in Pakistan: comparing the performance of a negative binomial spatial lag model with a Bayesian machine-learning model.
Introduction: Despite progress in tuberculosis (TB) treatment coverage in past years, an estimated 183 000 people with TB may not have been diagnosed in Pakistan in 2022. Therefore, there is a need to develop models which help to steer active case finding (ACF) towards populations with a high probability of having undetected TB. The aim of this study was to cross-validate TB positivity rate predictions in ACF settings of an existing Bayesian machine learning (BML) with a simpler frequentist model.
Methods: We conducted a retrospective analysis of cross-sectional data to identify predictors for detection of bacteriologically confirmed TB cases during ACF events in Pakistan. A predictive negative binomial regression (NBR) model was created, and the presence of spatial autocorrelation was examined to account for spatial dependencies in the outcome variable. The NBR and BML models were compared on their respective predictive precisions for the identification of TB hotspots, based on Root Mean Square Error values, k-fold cross-validation and tehsil-level (sub-district) prediction rankings.
Results: 407 (1.9%) bacteriologically confirmed cases among 21 227 visitors were detected in 414 ACF events between September 2020 and January 2022. In the final NBR, the spatial lag variable explained most variation in TB positivity rates across ACF events. NBR and BML predictions were similar at tehsil level. While the BML had a slightly lower root mean squared error (1.02 vs 1.03) the NBR had a slightly better fit based on the Akaike information criterion.
Conclusions: Statistical models can be effective in predicting TB hotspots for ACF planning, and the relatively simpler NBR model was nearly as effective as a more complex BML model. The predictions of different modelling approaches were similar, suggesting that predictions are more driven by covariates rather than modelling framework. The agreement between model results increases confidence in the potential utility of models to spatially target ACF activities in high need, low access areas.