{"title":"Deep Learning Prediction of Inflammatory Inducing Protein Coding mRNA in P. gingivalis Released Outer Membrane Vesicles.","authors":"Pradeep Kumar Yadalam, Raghavendra Vamsi Anegundi, Muthupandian Saravanan, Hadush Negash Meles, Artak Heboyan","doi":"10.1177/11795972241277081","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>The Insilco study uses deep learning algorithms to predict the protein-coding pg m RNA sequences.</p><p><strong>Material and methods: </strong>The NCBI GEO DATA SET GSE218606's GEO R tool discovered P.G's outer membrane vesicles' most differentially expressed mRNA. Genemania analyzed differentially expressed gene networks. Transcriptomics data were collected and labeled on P. gingivalis protein-coding mRNA sequence and pseudogene, lincRNA, and bidirectional promoter lincRNA. Orange, a machine learning tool, analyzed and predicted data after preprocessing. Naïve Bayes, neural networks, and gradient descent partition data into training and testing sets, yielding accurate results. Cross-validation, model accuracy, and ROC curve were evaluated after model validation.</p><p><strong>Results: </strong>Three models, Neural Networks, Naive Bayes, and Gradient Boosting, were evaluated using metrics like Area Under the Curve (AUC), Classification Accuracy (CA), <i>F</i>1 Score, Precision, Recall, and Specificity. Gradient Boosting achieved a balanced performance (AUC: 0.72, CA: 0.41, <i>F</i>1: 0.32) compared to Neural Networks (AUC: 0.721, CA: 0.391, F1: 0.314) and Naive Bayes (AUC: 0.701, CA: 0.172, F1: 0.114). While statistical tests revealed no significant differences between the models, Gradient Boosting exhibited a more balanced precision-recall relationship.</p><p><strong>Conclusion: </strong>In silico analysis using machine learning techniques successfully predicted protein-coding mRNA sequences within Porphyromonas gingivalis OMVs. Gradient Boosting outperformed other models (Neural Networks, Naive Bayes) by achieving a balanced performance across metrics like AUC, classification accuracy, and precision-recall, suggests its potential as a reliable tool for protein-coding mRNA prediction in P. gingivalis OMVs.</p>","PeriodicalId":42484,"journal":{"name":"Biomedical Engineering and Computational Biology","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11365027/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Engineering and Computational Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11795972241277081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Aim: The Insilco study uses deep learning algorithms to predict the protein-coding pg m RNA sequences.
Material and methods: The NCBI GEO DATA SET GSE218606's GEO R tool discovered P.G's outer membrane vesicles' most differentially expressed mRNA. Genemania analyzed differentially expressed gene networks. Transcriptomics data were collected and labeled on P. gingivalis protein-coding mRNA sequence and pseudogene, lincRNA, and bidirectional promoter lincRNA. Orange, a machine learning tool, analyzed and predicted data after preprocessing. Naïve Bayes, neural networks, and gradient descent partition data into training and testing sets, yielding accurate results. Cross-validation, model accuracy, and ROC curve were evaluated after model validation.
Results: Three models, Neural Networks, Naive Bayes, and Gradient Boosting, were evaluated using metrics like Area Under the Curve (AUC), Classification Accuracy (CA), F1 Score, Precision, Recall, and Specificity. Gradient Boosting achieved a balanced performance (AUC: 0.72, CA: 0.41, F1: 0.32) compared to Neural Networks (AUC: 0.721, CA: 0.391, F1: 0.314) and Naive Bayes (AUC: 0.701, CA: 0.172, F1: 0.114). While statistical tests revealed no significant differences between the models, Gradient Boosting exhibited a more balanced precision-recall relationship.
Conclusion: In silico analysis using machine learning techniques successfully predicted protein-coding mRNA sequences within Porphyromonas gingivalis OMVs. Gradient Boosting outperformed other models (Neural Networks, Naive Bayes) by achieving a balanced performance across metrics like AUC, classification accuracy, and precision-recall, suggests its potential as a reliable tool for protein-coding mRNA prediction in P. gingivalis OMVs.