Machine learning approaches to predict drug resistance in tuberculosis.

A T Subalakshmi, Arundhati Mahesh
{"title":"Machine learning approaches to predict drug resistance in tuberculosis.","authors":"A T Subalakshmi, Arundhati Mahesh","doi":"10.1016/j.compbiolchem.2025.108705","DOIUrl":null,"url":null,"abstract":"<p><p>Tuberculosis (TB) remains a global health crisis, with 10.8 million cases and 1.25 million deaths in 2023. The rise of drug-resistant TB has complicated treatment, while traditional diagnostic methods face limitations in speed, cost, and accuracy. This study explores machine learning (ML) models to predict drug resistance from genomic variants, offering a faster and more comprehensive solution. We compiled a comprehensive dataset of variations and mutations associated with resistance phenotypes from databases such as TBDReaMDB, GMTV, WHO, and CARD. For each mutation, both sequence-based features (e.g., physicochemical property changes, Provean scores) and structure-based features (e.g., hydrophobicity, flexibility, accessible surface area) were derived. Ensemble ML models (Stacking, Bagging and Voting Classifiers) were evaluated for their ability to predict resistance to key anti-TB drugs: Fluoroquinolones, Rifampicin, Isoniazid, and Pyrazinamide. Results achieved indicated that the model behaved differently on six TB resistance genes (gyrA, gyrB, inhA, katG, rpoB, pncA), with accuracy varying from 66 % (gyrA Stacking) to 91.37% (pncA Voting) and ROC scores varying from 0.69 (gyrA Bagging) to 0.92 (pncA Stacking). The Bagging model performed best for gyrA, gyrB and rpoB with strong classification, while the Stacking classifier performed well for inhA. Voting classifier proved to be the top-performing classifier for katG and pncA gene. The top-performing model for both genes was chosen, emphasizing a gene-specific strategy to maximize resistance prediction. This study demonstrates that gene-specific ensemble models, supported by a comprehensive feature set, can provide valuable predictions of drug resistance in M. tuberculosis. While promising, the findings remain a proof-of-concept and require further validation on larger and more diverse clinical datasets before clinical application.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"120 Pt 2","pages":"108705"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational biology and chemistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.compbiolchem.2025.108705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Tuberculosis (TB) remains a global health crisis, with 10.8 million cases and 1.25 million deaths in 2023. The rise of drug-resistant TB has complicated treatment, while traditional diagnostic methods face limitations in speed, cost, and accuracy. This study explores machine learning (ML) models to predict drug resistance from genomic variants, offering a faster and more comprehensive solution. We compiled a comprehensive dataset of variations and mutations associated with resistance phenotypes from databases such as TBDReaMDB, GMTV, WHO, and CARD. For each mutation, both sequence-based features (e.g., physicochemical property changes, Provean scores) and structure-based features (e.g., hydrophobicity, flexibility, accessible surface area) were derived. Ensemble ML models (Stacking, Bagging and Voting Classifiers) were evaluated for their ability to predict resistance to key anti-TB drugs: Fluoroquinolones, Rifampicin, Isoniazid, and Pyrazinamide. Results achieved indicated that the model behaved differently on six TB resistance genes (gyrA, gyrB, inhA, katG, rpoB, pncA), with accuracy varying from 66 % (gyrA Stacking) to 91.37% (pncA Voting) and ROC scores varying from 0.69 (gyrA Bagging) to 0.92 (pncA Stacking). The Bagging model performed best for gyrA, gyrB and rpoB with strong classification, while the Stacking classifier performed well for inhA. Voting classifier proved to be the top-performing classifier for katG and pncA gene. The top-performing model for both genes was chosen, emphasizing a gene-specific strategy to maximize resistance prediction. This study demonstrates that gene-specific ensemble models, supported by a comprehensive feature set, can provide valuable predictions of drug resistance in M. tuberculosis. While promising, the findings remain a proof-of-concept and require further validation on larger and more diverse clinical datasets before clinical application.

预测肺结核耐药性的机器学习方法。
结核病仍然是全球健康危机,2023年将有1080万例病例和125万人死亡。耐药结核病的增加使治疗复杂化,而传统的诊断方法在速度、成本和准确性方面面临限制。这项研究探索了机器学习(ML)模型来预测基因组变异的耐药性,提供了一个更快、更全面的解决方案。我们从TBDReaMDB、GMTV、WHO和CARD等数据库中收集了与耐药表型相关的变异和突变的综合数据集。对于每个突变,基于序列的特征(例如,物理化学性质变化,provan分数)和基于结构的特征(例如,疏水性,柔韧性,可达表面积)都被导出。对集合ML模型(堆叠、Bagging和投票分类器)预测关键抗结核药物耐药性的能力进行评估:氟喹诺酮类药物、利福平、异烟肼和吡嗪酰胺。结果表明,该模型对gyrA、gyrB、inhA、katG、rpoB、pncA 6个耐药基因表现不同,准确率从66 % (gyrA Stacking)到91.37% (pncA Voting)不等,ROC评分从0.69 (gyrA Bagging)到0.92 (pncA Stacking)不等。Bagging模型对gyrA、gyrB和rpoB分类效果最好,分类效果较强;Stacking模型对inhA分类效果较好。结果表明,投票分类器对katG和pncA基因的分类效果最好。选择了两个基因表现最好的模型,强调基因特异性策略以最大限度地预测抗性。这项研究表明,在综合特征集的支持下,基因特异性集合模型可以为结核分枝杆菌的耐药性提供有价值的预测。虽然有希望,但这些发现仍然是一个概念验证,在临床应用之前,需要在更大、更多样化的临床数据集上进一步验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信