{"title":"淋病奈瑟菌抗生素耐药性预测及生物标志物发现","authors":"R. Goyal, Rashmi Chowdhary","doi":"10.1145/3535508.3545097","DOIUrl":null,"url":null,"abstract":"Antibiotic resistance is a global problem projected to kill 10 million each year by 2050. The CDC lists Neisseria gonorrhoeae among the most urgent threats in this area as there exists a severe lack of efficient resistance detection techniques and only a handful of resistance-causing mutations have been identified thus far [2]. Currently, testing for antibiotic resistance in N. gonorrhoeae samples depends on culturing a sample in a lab environment. Sensitivity and specificity may reach 85--95% and 100% respectively, but only under optimal conditions and for urogenital specimens [3]. In this study, eight machine learning models - multi-layer perceptron, support vector machine, random forest classifier, K-nearest neighbors, eXtreme gradient boosting, Gaussian Naive Bayes, stochastic gradient descent, and logistic regression - were trained on three datasets containing data regarding resistance against azithromycin, ciprofloxacin and cefixime, which are three drugs of choice against N. gonorrhoeae. Each dataset had 3000+ samples and their corresponding resistance values; each sample consisted of a binary series representing the presence/absence of certain unitigs within that sample's genome. The technique differs from the standard research in this field, which has almost exclusively used whole-genome sequences. Once the models were trained, their accuracies, sensitivities and specificities were compared and analyzed. Maximum balanced accuracies of 97.6%, 95.9% and 100% were achieved on azithromycin, ciprofloxacin and cefixime training data respectively, exhibiting an improvement over previous work [4]. As a point of comparison between various models, performance on azithromycin resistance is represented in Fig 1. The balanced accuracy of GNB, at 68%, is too low to register on the scale. Subsequently, Fisher's exact test was used to test for the existence of biomarkers, i.e. unitigs that had a statistically significant correlation with antibiotic resistance. The feature importances of the top models from the first step were used to create a ranking of these genetic signatures, representing a novel method of unitig organization. Out of 584,362 unitigs, 191, 3304 and 1 were identified as statistically significant for azithromycin, ciprofloxacin and cefixime respectively. The majority of these genetic regions encode for proteins - some of which are likely novel discoveries - such as DsbA oxidoreductase, FtsJ methyltransferase, and Pilin glycosyltransferase. These biomarkers present useful leads for the development of point-of-care tests for antibiotic resistance in N. gonorrhoeae, while the ML models can predict resistance through direct genotype sequencing of patient samples [1].","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Antibiotic resistance prediction and biomarker discovery in Neisseria gonorrhoeae\",\"authors\":\"R. Goyal, Rashmi Chowdhary\",\"doi\":\"10.1145/3535508.3545097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Antibiotic resistance is a global problem projected to kill 10 million each year by 2050. The CDC lists Neisseria gonorrhoeae among the most urgent threats in this area as there exists a severe lack of efficient resistance detection techniques and only a handful of resistance-causing mutations have been identified thus far [2]. Currently, testing for antibiotic resistance in N. gonorrhoeae samples depends on culturing a sample in a lab environment. Sensitivity and specificity may reach 85--95% and 100% respectively, but only under optimal conditions and for urogenital specimens [3]. In this study, eight machine learning models - multi-layer perceptron, support vector machine, random forest classifier, K-nearest neighbors, eXtreme gradient boosting, Gaussian Naive Bayes, stochastic gradient descent, and logistic regression - were trained on three datasets containing data regarding resistance against azithromycin, ciprofloxacin and cefixime, which are three drugs of choice against N. gonorrhoeae. Each dataset had 3000+ samples and their corresponding resistance values; each sample consisted of a binary series representing the presence/absence of certain unitigs within that sample's genome. The technique differs from the standard research in this field, which has almost exclusively used whole-genome sequences. Once the models were trained, their accuracies, sensitivities and specificities were compared and analyzed. Maximum balanced accuracies of 97.6%, 95.9% and 100% were achieved on azithromycin, ciprofloxacin and cefixime training data respectively, exhibiting an improvement over previous work [4]. As a point of comparison between various models, performance on azithromycin resistance is represented in Fig 1. The balanced accuracy of GNB, at 68%, is too low to register on the scale. Subsequently, Fisher's exact test was used to test for the existence of biomarkers, i.e. unitigs that had a statistically significant correlation with antibiotic resistance. The feature importances of the top models from the first step were used to create a ranking of these genetic signatures, representing a novel method of unitig organization. Out of 584,362 unitigs, 191, 3304 and 1 were identified as statistically significant for azithromycin, ciprofloxacin and cefixime respectively. The majority of these genetic regions encode for proteins - some of which are likely novel discoveries - such as DsbA oxidoreductase, FtsJ methyltransferase, and Pilin glycosyltransferase. These biomarkers present useful leads for the development of point-of-care tests for antibiotic resistance in N. gonorrhoeae, while the ML models can predict resistance through direct genotype sequencing of patient samples [1].\",\"PeriodicalId\":354504,\"journal\":{\"name\":\"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3535508.3545097\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3535508.3545097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Antibiotic resistance prediction and biomarker discovery in Neisseria gonorrhoeae
Antibiotic resistance is a global problem projected to kill 10 million each year by 2050. The CDC lists Neisseria gonorrhoeae among the most urgent threats in this area as there exists a severe lack of efficient resistance detection techniques and only a handful of resistance-causing mutations have been identified thus far [2]. Currently, testing for antibiotic resistance in N. gonorrhoeae samples depends on culturing a sample in a lab environment. Sensitivity and specificity may reach 85--95% and 100% respectively, but only under optimal conditions and for urogenital specimens [3]. In this study, eight machine learning models - multi-layer perceptron, support vector machine, random forest classifier, K-nearest neighbors, eXtreme gradient boosting, Gaussian Naive Bayes, stochastic gradient descent, and logistic regression - were trained on three datasets containing data regarding resistance against azithromycin, ciprofloxacin and cefixime, which are three drugs of choice against N. gonorrhoeae. Each dataset had 3000+ samples and their corresponding resistance values; each sample consisted of a binary series representing the presence/absence of certain unitigs within that sample's genome. The technique differs from the standard research in this field, which has almost exclusively used whole-genome sequences. Once the models were trained, their accuracies, sensitivities and specificities were compared and analyzed. Maximum balanced accuracies of 97.6%, 95.9% and 100% were achieved on azithromycin, ciprofloxacin and cefixime training data respectively, exhibiting an improvement over previous work [4]. As a point of comparison between various models, performance on azithromycin resistance is represented in Fig 1. The balanced accuracy of GNB, at 68%, is too low to register on the scale. Subsequently, Fisher's exact test was used to test for the existence of biomarkers, i.e. unitigs that had a statistically significant correlation with antibiotic resistance. The feature importances of the top models from the first step were used to create a ranking of these genetic signatures, representing a novel method of unitig organization. Out of 584,362 unitigs, 191, 3304 and 1 were identified as statistically significant for azithromycin, ciprofloxacin and cefixime respectively. The majority of these genetic regions encode for proteins - some of which are likely novel discoveries - such as DsbA oxidoreductase, FtsJ methyltransferase, and Pilin glycosyltransferase. These biomarkers present useful leads for the development of point-of-care tests for antibiotic resistance in N. gonorrhoeae, while the ML models can predict resistance through direct genotype sequencing of patient samples [1].