Machine learning-based approach for identification of new resistance associated mutations from whole genome sequences of Mycobacterium tuberculosis.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-03-11 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf050
Ankita Pal, Debasisa Mohanty
{"title":"Machine learning-based approach for identification of new resistance associated mutations from whole genome sequences of <i>Mycobacterium tuberculosis</i>.","authors":"Ankita Pal, Debasisa Mohanty","doi":"10.1093/bioadv/vbaf050","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Currently available methods for the prediction of genotypic drug resistance in <i>Mycobacterium tuberculosis</i> utilize information on known markers of drug resistance. Hence, machine learning approaches are needed that can discover new resistance markers.</p><p><strong>Results: </strong>Whole genome sequences with known phenotypic drug resistance profiles have been utilized to train XGBoost and ANN classifiers for 5 first-line and 8 second-line tuberculosis drugs. Benchmarking on a completely independent dataset from CRyPTIC database revealed that our method has high sensitivity (90%-95%) and specificity (94%-99%) for five first-line drugs and robust performance for six second-line drugs with a sensitivity of 77%-89% at over 95% specificity. An explainable AI method, SHapley Additive exPlanations, has successfully identified resistance mutations for each drug in a completely automated way. This approach could not only identify known resistance associated mutations in agreement with the WHO mutation catalogue, but also predicted >100 other potential resistance associated mutations for 13 antibiotics in new genes outside the known resistance loci. Identification of new resistance markers opens up the opportunity for the discovery of novel mechanisms of drug resistance.</p><p><strong>Availability and implementation: </strong>Our prediction method has been implemented as TB-AMRpred webserver and command line tool, available freely at http://www.nii.ac.in/TB-AMRpred.html and https://github.com/Ankitapal1995/TB-AMRprd.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf050"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11930343/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Currently available methods for the prediction of genotypic drug resistance in Mycobacterium tuberculosis utilize information on known markers of drug resistance. Hence, machine learning approaches are needed that can discover new resistance markers.

Results: Whole genome sequences with known phenotypic drug resistance profiles have been utilized to train XGBoost and ANN classifiers for 5 first-line and 8 second-line tuberculosis drugs. Benchmarking on a completely independent dataset from CRyPTIC database revealed that our method has high sensitivity (90%-95%) and specificity (94%-99%) for five first-line drugs and robust performance for six second-line drugs with a sensitivity of 77%-89% at over 95% specificity. An explainable AI method, SHapley Additive exPlanations, has successfully identified resistance mutations for each drug in a completely automated way. This approach could not only identify known resistance associated mutations in agreement with the WHO mutation catalogue, but also predicted >100 other potential resistance associated mutations for 13 antibiotics in new genes outside the known resistance loci. Identification of new resistance markers opens up the opportunity for the discovery of novel mechanisms of drug resistance.

Availability and implementation: Our prediction method has been implemented as TB-AMRpred webserver and command line tool, available freely at http://www.nii.ac.in/TB-AMRpred.html and https://github.com/Ankitapal1995/TB-AMRprd.

基于机器学习的方法从结核分枝杆菌全基因组序列中鉴定新的耐药性相关突变。
动机:目前可用的预测结核分枝杆菌基因型耐药的方法利用已知耐药标志物的信息。因此,需要机器学习方法来发现新的抗性标记。结果:利用已知表型耐药谱的全基因组序列,对5种一线和8种二线结核药物进行XGBoost和ANN分类器训练。对来自CRyPTIC数据库的完全独立数据集进行基准测试表明,我们的方法对5种一线药物具有高灵敏度(90%-95%)和特异性(94%-99%),对6种二线药物具有良好的性能,灵敏度为77%-89%,特异性超过95%。一种可解释的人工智能方法,SHapley Additive explained,已经成功地以完全自动化的方式识别出每种药物的耐药性突变。该方法不仅能够识别出与WHO突变目录一致的已知耐药相关突变,而且还能预测出已知耐药位点以外的13种抗生素新基因中100个其他潜在耐药相关突变。鉴定新的耐药标记为发现新的耐药机制开辟了机会。可用性和实现:我们的预测方法已经作为TB-AMRpred web服务器和命令行工具实现,可在http://www.nii.ac.in/TB-AMRpred.html和https://github.com/Ankitapal1995/TB-AMRprd免费获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信