Debasish Swapnesh Kumar Nayak, Saswati Mahapatra, Sweta Padma Routray, Swayamprabha Sahoo, Santanu Kumar Sahoo, Mostafa M Fouda, Narpinder Singh, Esma R Isenovic, Luca Saba, Jasjit S Suri, Tripti Swarnkar
{"title":"aiGeneR 1.0: An Artificial Intelligence Technique for the Revelation of Informative and Antibiotic Resistant Genes in <i>Escherichia coli</i>.","authors":"Debasish Swapnesh Kumar Nayak, Saswati Mahapatra, Sweta Padma Routray, Swayamprabha Sahoo, Santanu Kumar Sahoo, Mostafa M Fouda, Narpinder Singh, Esma R Isenovic, Luca Saba, Jasjit S Suri, Tripti Swarnkar","doi":"10.31083/j.fbl2902082","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>There are several antibiotic resistance genes (ARG) for the <i>Escherichia coli (E. coli)</i> bacteria that cause urinary tract infections (UTI), and it is therefore important to identify these ARG. Artificial Intelligence (AI) has been used previously in the field of gene expression data, but never adopted for the detection and classification of bacterial ARG. We hypothesize, if the data is correctly conferred, right features are selected, and Deep Learning (DL) classification models are optimized, then (i) non-linear DL models would perform better than Machine Learning (ML) models, (ii) leads to higher accuracy, (iii) can identify the hub genes, and, (iv) can identify gene pathways accurately. We have therefore designed aiGeneR, the first of its kind system that uses DL-based models to identify ARG in <i>E. coli</i> in gene expression data.</p><p><strong>Methodology: </strong>The aiGeneR consists of a tandem connection of quality control embedded with feature extraction and AI-based classification of ARG. We adopted a cross-validation approach to evaluate the performance of aiGeneR using accuracy, precision, recall, and F1-score. Further, we analyzed the effect of sample size ensuring generalization of models and compare against the power analysis. The aiGeneR was validated scientifically and biologically for hub genes and pathways. We benchmarked aiGeneR against two linear and two other non-linear AI models.</p><p><strong>Results: </strong>The aiGeneR identifies tetM (an ARG) and showed an accuracy of 93% with area under the curve (AUC) of 0.99 (<i>p</i> < 0.05). The mean accuracy of non-linear models was 22% higher compared to linear models. We scientifically and biologically validated the aiGeneR.</p><p><strong>Conclusions: </strong>aiGeneR successfully detected the <i>E. coli</i> genes validating our four hypotheses.</p>","PeriodicalId":73069,"journal":{"name":"Frontiers in bioscience (Landmark edition)","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioscience (Landmark edition)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31083/j.fbl2902082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: There are several antibiotic resistance genes (ARG) for the Escherichia coli (E. coli) bacteria that cause urinary tract infections (UTI), and it is therefore important to identify these ARG. Artificial Intelligence (AI) has been used previously in the field of gene expression data, but never adopted for the detection and classification of bacterial ARG. We hypothesize, if the data is correctly conferred, right features are selected, and Deep Learning (DL) classification models are optimized, then (i) non-linear DL models would perform better than Machine Learning (ML) models, (ii) leads to higher accuracy, (iii) can identify the hub genes, and, (iv) can identify gene pathways accurately. We have therefore designed aiGeneR, the first of its kind system that uses DL-based models to identify ARG in E. coli in gene expression data.
Methodology: The aiGeneR consists of a tandem connection of quality control embedded with feature extraction and AI-based classification of ARG. We adopted a cross-validation approach to evaluate the performance of aiGeneR using accuracy, precision, recall, and F1-score. Further, we analyzed the effect of sample size ensuring generalization of models and compare against the power analysis. The aiGeneR was validated scientifically and biologically for hub genes and pathways. We benchmarked aiGeneR against two linear and two other non-linear AI models.
Results: The aiGeneR identifies tetM (an ARG) and showed an accuracy of 93% with area under the curve (AUC) of 0.99 (p < 0.05). The mean accuracy of non-linear models was 22% higher compared to linear models. We scientifically and biologically validated the aiGeneR.
Conclusions: aiGeneR successfully detected the E. coli genes validating our four hypotheses.