Machine learning approaches to predict drug resistance in tuberculosis.

Computational biology and chemistry Pub Date : 2025-09-30 DOI:10.1016/j.compbiolchem.2025.108705

A T Subalakshmi, Arundhati Mahesh

{"title":"Machine learning approaches to predict drug resistance in tuberculosis.","authors":"A T Subalakshmi, Arundhati Mahesh","doi":"10.1016/j.compbiolchem.2025.108705","DOIUrl":null,"url":null,"abstract":"<p><p>Tuberculosis (TB) remains a global health crisis, with 10.8 million cases and 1.25 million deaths in 2023. The rise of drug-resistant TB has complicated treatment, while traditional diagnostic methods face limitations in speed, cost, and accuracy. This study explores machine learning (ML) models to predict drug resistance from genomic variants, offering a faster and more comprehensive solution. We compiled a comprehensive dataset of variations and mutations associated with resistance phenotypes from databases such as TBDReaMDB, GMTV, WHO, and CARD. For each mutation, both sequence-based features (e.g., physicochemical property changes, Provean scores) and structure-based features (e.g., hydrophobicity, flexibility, accessible surface area) were derived. Ensemble ML models (Stacking, Bagging and Voting Classifiers) were evaluated for their ability to predict resistance to key anti-TB drugs: Fluoroquinolones, Rifampicin, Isoniazid, and Pyrazinamide. Results achieved indicated that the model behaved differently on six TB resistance genes (gyrA, gyrB, inhA, katG, rpoB, pncA), with accuracy varying from 66 % (gyrA Stacking) to 91.37% (pncA Voting) and ROC scores varying from 0.69 (gyrA Bagging) to 0.92 (pncA Stacking). The Bagging model performed best for gyrA, gyrB and rpoB with strong classification, while the Stacking classifier performed well for inhA. Voting classifier proved to be the top-performing classifier for katG and pncA gene. The top-performing model for both genes was chosen, emphasizing a gene-specific strategy to maximize resistance prediction. This study demonstrates that gene-specific ensemble models, supported by a comprehensive feature set, can provide valuable predictions of drug resistance in M. tuberculosis. While promising, the findings remain a proof-of-concept and require further validation on larger and more diverse clinical datasets before clinical application.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"120 Pt 2","pages":"108705"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational biology and chemistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.compbiolchem.2025.108705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Tuberculosis (TB) remains a global health crisis, with 10.8 million cases and 1.25 million deaths in 2023. The rise of drug-resistant TB has complicated treatment, while traditional diagnostic methods face limitations in speed, cost, and accuracy. This study explores machine learning (ML) models to predict drug resistance from genomic variants, offering a faster and more comprehensive solution. We compiled a comprehensive dataset of variations and mutations associated with resistance phenotypes from databases such as TBDReaMDB, GMTV, WHO, and CARD. For each mutation, both sequence-based features (e.g., physicochemical property changes, Provean scores) and structure-based features (e.g., hydrophobicity, flexibility, accessible surface area) were derived. Ensemble ML models (Stacking, Bagging and Voting Classifiers) were evaluated for their ability to predict resistance to key anti-TB drugs: Fluoroquinolones, Rifampicin, Isoniazid, and Pyrazinamide. Results achieved indicated that the model behaved differently on six TB resistance genes (gyrA, gyrB, inhA, katG, rpoB, pncA), with accuracy varying from 66 % (gyrA Stacking) to 91.37% (pncA Voting) and ROC scores varying from 0.69 (gyrA Bagging) to 0.92 (pncA Stacking). The Bagging model performed best for gyrA, gyrB and rpoB with strong classification, while the Stacking classifier performed well for inhA. Voting classifier proved to be the top-performing classifier for katG and pncA gene. The top-performing model for both genes was chosen, emphasizing a gene-specific strategy to maximize resistance prediction. This study demonstrates that gene-specific ensemble models, supported by a comprehensive feature set, can provide valuable predictions of drug resistance in M. tuberculosis. While promising, the findings remain a proof-of-concept and require further validation on larger and more diverse clinical datasets before clinical application.

查看原文本刊更多论文

预测肺结核耐药性的机器学习方法。

结核病仍然是全球健康危机，2023年将有1080万例病例和125万人死亡。耐药结核病的增加使治疗复杂化，而传统的诊断方法在速度、成本和准确性方面面临限制。这项研究探索了机器学习（ML）模型来预测基因组变异的耐药性，提供了一个更快、更全面的解决方案。我们从TBDReaMDB、GMTV、WHO和CARD等数据库中收集了与耐药表型相关的变异和突变的综合数据集。对于每个突变，基于序列的特征（例如，物理化学性质变化，provan分数）和基于结构的特征（例如，疏水性，柔韧性，可达表面积）都被导出。对集合ML模型（堆叠、Bagging和投票分类器）预测关键抗结核药物耐药性的能力进行评估：氟喹诺酮类药物、利福平、异烟肼和吡嗪酰胺。结果表明，该模型对gyrA、gyrB、inhA、katG、rpoB、pncA 6个耐药基因表现不同，准确率从66 % （gyrA Stacking）到91.37% （pncA Voting）不等，ROC评分从0.69 （gyrA Bagging）到0.92 （pncA Stacking）不等。Bagging模型对gyrA、gyrB和rpoB分类效果最好，分类效果较强；Stacking模型对inhA分类效果较好。结果表明，投票分类器对katG和pncA基因的分类效果最好。选择了两个基因表现最好的模型，强调基因特异性策略以最大限度地预测抗性。这项研究表明，在综合特征集的支持下，基因特异性集合模型可以为结核分枝杆菌的耐药性提供有价值的预测。虽然有希望，但这些发现仍然是一个概念验证，在临床应用之前，需要在更大、更多样化的临床数据集上进一步验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational biology and chemistry

自引率

0.00%

发文量