Nguyen Ky Phat, Yoonah Lee, Dinh Hoa Vu, Nguyen Phuoc Long, Seongoh Park
{"title":"Risk factors for tuberculosis treatment outcomes: a statistical learning-based exploration using the SINAN database with incomplete observations.","authors":"Nguyen Ky Phat, Yoonah Lee, Dinh Hoa Vu, Nguyen Phuoc Long, Seongoh Park","doi":"10.1186/s12911-025-03139-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Understanding early predictors of treatment outcomes allows better outcome prediction and resource allocation for efficient tuberculosis (TB) management.</p><p><strong>Objectives: </strong>This study aimed to predict treatment outcomes of TB patients from a real-world population-wide health record dataset with a significant rate of incomplete observations. In addition, potential risk factors associated with death during TB treatment were investigated.</p><p><strong>Methods: </strong>We exploited the upweighting approach and multiple imputation analysis (MIA) to address the extreme imbalance in responses and missing data. Three algorithms were employed for TB treatment outcome prediction, including logistic regression (LOGIT), random forest, and stochastic gradient boosting. The three models exhibited similar performance in predicting the treatment outcomes. Moreover, an interpretation of LOGIT was conducted, adjusted odds ratios (aORs) were computed, and the interpretation results were compared between MIA and complete case analysis (CCA).</p><p><strong>Results: </strong>MIA was an appropriate method for coping with missing data. In addition, compared to CCA, the interpretation results of the MIA-derived LOGIT showed more statistically significant covariates associated with TB treatment outcomes. In MIA, factors such as TB clinical form involving both pulmonary TB and extrapulmonary TB [aOR = 3.077, 95% confidence interval (CI) = 2.994-3.163], retreatment after abandonment (aOR = 2.272, 95% CI = 2.209-2.338), and the absence of isoniazid (aOR = 2.072, 95% CI = 1.892-2.269) or rifampicin (aOR = 1.968, 95% CI = 1.746-2.218) in the treatment regimen were associated with increased odds of death.</p><p><strong>Conclusion: </strong>In conclusion, our results shed light on the potential risk factors for death during TB treatment and suggest the use of simple yet interpretable LOGIT for the prediction of TB treatment outcomes.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"301"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12341307/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03139-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Understanding early predictors of treatment outcomes allows better outcome prediction and resource allocation for efficient tuberculosis (TB) management.
Objectives: This study aimed to predict treatment outcomes of TB patients from a real-world population-wide health record dataset with a significant rate of incomplete observations. In addition, potential risk factors associated with death during TB treatment were investigated.
Methods: We exploited the upweighting approach and multiple imputation analysis (MIA) to address the extreme imbalance in responses and missing data. Three algorithms were employed for TB treatment outcome prediction, including logistic regression (LOGIT), random forest, and stochastic gradient boosting. The three models exhibited similar performance in predicting the treatment outcomes. Moreover, an interpretation of LOGIT was conducted, adjusted odds ratios (aORs) were computed, and the interpretation results were compared between MIA and complete case analysis (CCA).
Results: MIA was an appropriate method for coping with missing data. In addition, compared to CCA, the interpretation results of the MIA-derived LOGIT showed more statistically significant covariates associated with TB treatment outcomes. In MIA, factors such as TB clinical form involving both pulmonary TB and extrapulmonary TB [aOR = 3.077, 95% confidence interval (CI) = 2.994-3.163], retreatment after abandonment (aOR = 2.272, 95% CI = 2.209-2.338), and the absence of isoniazid (aOR = 2.072, 95% CI = 1.892-2.269) or rifampicin (aOR = 1.968, 95% CI = 1.746-2.218) in the treatment regimen were associated with increased odds of death.
Conclusion: In conclusion, our results shed light on the potential risk factors for death during TB treatment and suggest the use of simple yet interpretable LOGIT for the prediction of TB treatment outcomes.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.