{"title":"预测肺结核耐药性的机器学习方法。","authors":"A T Subalakshmi, Arundhati Mahesh","doi":"10.1016/j.compbiolchem.2025.108705","DOIUrl":null,"url":null,"abstract":"<p><p>Tuberculosis (TB) remains a global health crisis, with 10.8 million cases and 1.25 million deaths in 2023. The rise of drug-resistant TB has complicated treatment, while traditional diagnostic methods face limitations in speed, cost, and accuracy. This study explores machine learning (ML) models to predict drug resistance from genomic variants, offering a faster and more comprehensive solution. We compiled a comprehensive dataset of variations and mutations associated with resistance phenotypes from databases such as TBDReaMDB, GMTV, WHO, and CARD. For each mutation, both sequence-based features (e.g., physicochemical property changes, Provean scores) and structure-based features (e.g., hydrophobicity, flexibility, accessible surface area) were derived. Ensemble ML models (Stacking, Bagging and Voting Classifiers) were evaluated for their ability to predict resistance to key anti-TB drugs: Fluoroquinolones, Rifampicin, Isoniazid, and Pyrazinamide. Results achieved indicated that the model behaved differently on six TB resistance genes (gyrA, gyrB, inhA, katG, rpoB, pncA), with accuracy varying from 66 % (gyrA Stacking) to 91.37% (pncA Voting) and ROC scores varying from 0.69 (gyrA Bagging) to 0.92 (pncA Stacking). The Bagging model performed best for gyrA, gyrB and rpoB with strong classification, while the Stacking classifier performed well for inhA. Voting classifier proved to be the top-performing classifier for katG and pncA gene. The top-performing model for both genes was chosen, emphasizing a gene-specific strategy to maximize resistance prediction. This study demonstrates that gene-specific ensemble models, supported by a comprehensive feature set, can provide valuable predictions of drug resistance in M. tuberculosis. While promising, the findings remain a proof-of-concept and require further validation on larger and more diverse clinical datasets before clinical application.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"120 Pt 2","pages":"108705"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning approaches to predict drug resistance in tuberculosis.\",\"authors\":\"A T Subalakshmi, Arundhati Mahesh\",\"doi\":\"10.1016/j.compbiolchem.2025.108705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Tuberculosis (TB) remains a global health crisis, with 10.8 million cases and 1.25 million deaths in 2023. The rise of drug-resistant TB has complicated treatment, while traditional diagnostic methods face limitations in speed, cost, and accuracy. This study explores machine learning (ML) models to predict drug resistance from genomic variants, offering a faster and more comprehensive solution. We compiled a comprehensive dataset of variations and mutations associated with resistance phenotypes from databases such as TBDReaMDB, GMTV, WHO, and CARD. For each mutation, both sequence-based features (e.g., physicochemical property changes, Provean scores) and structure-based features (e.g., hydrophobicity, flexibility, accessible surface area) were derived. Ensemble ML models (Stacking, Bagging and Voting Classifiers) were evaluated for their ability to predict resistance to key anti-TB drugs: Fluoroquinolones, Rifampicin, Isoniazid, and Pyrazinamide. Results achieved indicated that the model behaved differently on six TB resistance genes (gyrA, gyrB, inhA, katG, rpoB, pncA), with accuracy varying from 66 % (gyrA Stacking) to 91.37% (pncA Voting) and ROC scores varying from 0.69 (gyrA Bagging) to 0.92 (pncA Stacking). The Bagging model performed best for gyrA, gyrB and rpoB with strong classification, while the Stacking classifier performed well for inhA. Voting classifier proved to be the top-performing classifier for katG and pncA gene. The top-performing model for both genes was chosen, emphasizing a gene-specific strategy to maximize resistance prediction. This study demonstrates that gene-specific ensemble models, supported by a comprehensive feature set, can provide valuable predictions of drug resistance in M. tuberculosis. While promising, the findings remain a proof-of-concept and require further validation on larger and more diverse clinical datasets before clinical application.</p>\",\"PeriodicalId\":93952,\"journal\":{\"name\":\"Computational biology and chemistry\",\"volume\":\"120 Pt 2\",\"pages\":\"108705\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational biology and chemistry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.compbiolchem.2025.108705\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational biology and chemistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.compbiolchem.2025.108705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine learning approaches to predict drug resistance in tuberculosis.
Tuberculosis (TB) remains a global health crisis, with 10.8 million cases and 1.25 million deaths in 2023. The rise of drug-resistant TB has complicated treatment, while traditional diagnostic methods face limitations in speed, cost, and accuracy. This study explores machine learning (ML) models to predict drug resistance from genomic variants, offering a faster and more comprehensive solution. We compiled a comprehensive dataset of variations and mutations associated with resistance phenotypes from databases such as TBDReaMDB, GMTV, WHO, and CARD. For each mutation, both sequence-based features (e.g., physicochemical property changes, Provean scores) and structure-based features (e.g., hydrophobicity, flexibility, accessible surface area) were derived. Ensemble ML models (Stacking, Bagging and Voting Classifiers) were evaluated for their ability to predict resistance to key anti-TB drugs: Fluoroquinolones, Rifampicin, Isoniazid, and Pyrazinamide. Results achieved indicated that the model behaved differently on six TB resistance genes (gyrA, gyrB, inhA, katG, rpoB, pncA), with accuracy varying from 66 % (gyrA Stacking) to 91.37% (pncA Voting) and ROC scores varying from 0.69 (gyrA Bagging) to 0.92 (pncA Stacking). The Bagging model performed best for gyrA, gyrB and rpoB with strong classification, while the Stacking classifier performed well for inhA. Voting classifier proved to be the top-performing classifier for katG and pncA gene. The top-performing model for both genes was chosen, emphasizing a gene-specific strategy to maximize resistance prediction. This study demonstrates that gene-specific ensemble models, supported by a comprehensive feature set, can provide valuable predictions of drug resistance in M. tuberculosis. While promising, the findings remain a proof-of-concept and require further validation on larger and more diverse clinical datasets before clinical application.