Antibiotic resistance prediction and biomarker discovery in Neisseria gonorrhoeae

R. Goyal, Rashmi Chowdhary
{"title":"Antibiotic resistance prediction and biomarker discovery in Neisseria gonorrhoeae","authors":"R. Goyal, Rashmi Chowdhary","doi":"10.1145/3535508.3545097","DOIUrl":null,"url":null,"abstract":"Antibiotic resistance is a global problem projected to kill 10 million each year by 2050. The CDC lists Neisseria gonorrhoeae among the most urgent threats in this area as there exists a severe lack of efficient resistance detection techniques and only a handful of resistance-causing mutations have been identified thus far [2]. Currently, testing for antibiotic resistance in N. gonorrhoeae samples depends on culturing a sample in a lab environment. Sensitivity and specificity may reach 85--95% and 100% respectively, but only under optimal conditions and for urogenital specimens [3]. In this study, eight machine learning models - multi-layer perceptron, support vector machine, random forest classifier, K-nearest neighbors, eXtreme gradient boosting, Gaussian Naive Bayes, stochastic gradient descent, and logistic regression - were trained on three datasets containing data regarding resistance against azithromycin, ciprofloxacin and cefixime, which are three drugs of choice against N. gonorrhoeae. Each dataset had 3000+ samples and their corresponding resistance values; each sample consisted of a binary series representing the presence/absence of certain unitigs within that sample's genome. The technique differs from the standard research in this field, which has almost exclusively used whole-genome sequences. Once the models were trained, their accuracies, sensitivities and specificities were compared and analyzed. Maximum balanced accuracies of 97.6%, 95.9% and 100% were achieved on azithromycin, ciprofloxacin and cefixime training data respectively, exhibiting an improvement over previous work [4]. As a point of comparison between various models, performance on azithromycin resistance is represented in Fig 1. The balanced accuracy of GNB, at 68%, is too low to register on the scale. Subsequently, Fisher's exact test was used to test for the existence of biomarkers, i.e. unitigs that had a statistically significant correlation with antibiotic resistance. The feature importances of the top models from the first step were used to create a ranking of these genetic signatures, representing a novel method of unitig organization. Out of 584,362 unitigs, 191, 3304 and 1 were identified as statistically significant for azithromycin, ciprofloxacin and cefixime respectively. The majority of these genetic regions encode for proteins - some of which are likely novel discoveries - such as DsbA oxidoreductase, FtsJ methyltransferase, and Pilin glycosyltransferase. These biomarkers present useful leads for the development of point-of-care tests for antibiotic resistance in N. gonorrhoeae, while the ML models can predict resistance through direct genotype sequencing of patient samples [1].","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3535508.3545097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Antibiotic resistance is a global problem projected to kill 10 million each year by 2050. The CDC lists Neisseria gonorrhoeae among the most urgent threats in this area as there exists a severe lack of efficient resistance detection techniques and only a handful of resistance-causing mutations have been identified thus far [2]. Currently, testing for antibiotic resistance in N. gonorrhoeae samples depends on culturing a sample in a lab environment. Sensitivity and specificity may reach 85--95% and 100% respectively, but only under optimal conditions and for urogenital specimens [3]. In this study, eight machine learning models - multi-layer perceptron, support vector machine, random forest classifier, K-nearest neighbors, eXtreme gradient boosting, Gaussian Naive Bayes, stochastic gradient descent, and logistic regression - were trained on three datasets containing data regarding resistance against azithromycin, ciprofloxacin and cefixime, which are three drugs of choice against N. gonorrhoeae. Each dataset had 3000+ samples and their corresponding resistance values; each sample consisted of a binary series representing the presence/absence of certain unitigs within that sample's genome. The technique differs from the standard research in this field, which has almost exclusively used whole-genome sequences. Once the models were trained, their accuracies, sensitivities and specificities were compared and analyzed. Maximum balanced accuracies of 97.6%, 95.9% and 100% were achieved on azithromycin, ciprofloxacin and cefixime training data respectively, exhibiting an improvement over previous work [4]. As a point of comparison between various models, performance on azithromycin resistance is represented in Fig 1. The balanced accuracy of GNB, at 68%, is too low to register on the scale. Subsequently, Fisher's exact test was used to test for the existence of biomarkers, i.e. unitigs that had a statistically significant correlation with antibiotic resistance. The feature importances of the top models from the first step were used to create a ranking of these genetic signatures, representing a novel method of unitig organization. Out of 584,362 unitigs, 191, 3304 and 1 were identified as statistically significant for azithromycin, ciprofloxacin and cefixime respectively. The majority of these genetic regions encode for proteins - some of which are likely novel discoveries - such as DsbA oxidoreductase, FtsJ methyltransferase, and Pilin glycosyltransferase. These biomarkers present useful leads for the development of point-of-care tests for antibiotic resistance in N. gonorrhoeae, while the ML models can predict resistance through direct genotype sequencing of patient samples [1].
淋病奈瑟菌抗生素耐药性预测及生物标志物发现
抗生素耐药性是一个全球性问题,预计到2050年,每年将导致1000万人死亡。CDC将淋病奈瑟菌列为该领域最紧迫的威胁之一,因为严重缺乏有效的耐药性检测技术,而且迄今为止只发现了少数导致耐药性的突变[2]。目前,检测淋病奈瑟菌样本的抗生素耐药性取决于在实验室环境中培养样本。敏感性和特异性可分别达到85- 95%和100%,但仅在最佳条件下和泌尿生殖器标本[3]。在这项研究中,8个机器学习模型——多层感知器、支持向量机、随机森林分类器、k近邻、极端梯度增强、高斯朴素贝叶斯、随机梯度下降和逻辑回归——在三个数据集上进行了训练,这些数据集包含了对阿奇霉素、环丙沙星和头孢克肟这三种治疗淋病奈瑟菌的药物的耐药性数据。每个数据集有3000多个样本及其对应的电阻值;每个样本由一个二进制序列组成,代表该样本基因组中某些单位的存在/不存在。该技术不同于该领域的标准研究,后者几乎完全使用全基因组序列。一旦模型被训练,他们的准确性,灵敏度和特异性进行比较和分析。阿奇霉素、环丙沙星和头孢克肟训练数据的最大平衡准确率分别为97.6%、95.9%和100%,比以往的工作有所提高[4]。作为各模型的比较点,阿奇霉素耐药表现如图1所示。GNB的平衡精度为68%,太低,无法在刻度上注册。随后,使用Fisher的精确测试来测试生物标志物的存在,即与抗生素耐药性具有统计学意义相关的单位。从第一步开始的顶级模型的特征重要性被用来创建这些遗传特征的排名,代表了一种新的统一组织方法。在584,362个单位中,阿奇霉素、环丙沙星和头孢克肟分别有191个、3304个和1个具有统计学意义。这些遗传区域中的大多数编码蛋白质,其中一些可能是新发现的,如DsbA氧化还原酶、FtsJ甲基转移酶和Pilin糖基转移酶。这些生物标志物为开发淋病奈瑟菌抗生素耐药性的护理点检测提供了有用的线索,而ML模型可以通过对患者样本进行直接基因型测序来预测耐药性[1]。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信