Deep Learning Prediction of Inflammatory Inducing Protein Coding mRNA in P. gingivalis Released Outer Membrane Vesicles.

IF 2.3 Q3 ENGINEERING, BIOMEDICAL

Biomedical Engineering and Computational Biology Pub Date : 2024-08-30 eCollection Date: 2024-01-01 DOI:10.1177/11795972241277081

Pradeep Kumar Yadalam, Raghavendra Vamsi Anegundi, Muthupandian Saravanan, Hadush Negash Meles, Artak Heboyan

{"title":"Deep Learning Prediction of Inflammatory Inducing Protein Coding mRNA in P. gingivalis Released Outer Membrane Vesicles.","authors":"Pradeep Kumar Yadalam, Raghavendra Vamsi Anegundi, Muthupandian Saravanan, Hadush Negash Meles, Artak Heboyan","doi":"10.1177/11795972241277081","DOIUrl":null,"url":null,"abstract":"Aim: The Insilco study uses deep learning algorithms to predict the protein-coding pg m RNA sequences.Material and methods: The NCBI GEO DATA SET GSE218606's GEO R tool discovered P.G's outer membrane vesicles' most differentially expressed mRNA. Genemania analyzed differentially expressed gene networks. Transcriptomics data were collected and labeled on P. gingivalis protein-coding mRNA sequence and pseudogene, lincRNA, and bidirectional promoter lincRNA. Orange, a machine learning tool, analyzed and predicted data after preprocessing. Naïve Bayes, neural networks, and gradient descent partition data into training and testing sets, yielding accurate results. Cross-validation, model accuracy, and ROC curve were evaluated after model validation.Results: Three models, Neural Networks, Naive Bayes, and Gradient Boosting, were evaluated using metrics like Area Under the Curve (AUC), Classification Accuracy (CA), F1 Score, Precision, Recall, and Specificity. Gradient Boosting achieved a balanced performance (AUC: 0.72, CA: 0.41, F1: 0.32) compared to Neural Networks (AUC: 0.721, CA: 0.391, F1: 0.314) and Naive Bayes (AUC: 0.701, CA: 0.172, F1: 0.114). While statistical tests revealed no significant differences between the models, Gradient Boosting exhibited a more balanced precision-recall relationship.Conclusion: In silico analysis using machine learning techniques successfully predicted protein-coding mRNA sequences within Porphyromonas gingivalis OMVs. Gradient Boosting outperformed other models (Neural Networks, Naive Bayes) by achieving a balanced performance across metrics like AUC, classification accuracy, and precision-recall, suggests its potential as a reliable tool for protein-coding mRNA prediction in P. gingivalis OMVs.","PeriodicalId":42484,"journal":{"name":"Biomedical Engineering and Computational Biology","volume":"15 ","pages":"11795972241277081"},"PeriodicalIF":2.3000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11365027/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Engineering and Computational Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11795972241277081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Aim: The Insilco study uses deep learning algorithms to predict the protein-coding pg m RNA sequences.

Material and methods: The NCBI GEO DATA SET GSE218606's GEO R tool discovered P.G's outer membrane vesicles' most differentially expressed mRNA. Genemania analyzed differentially expressed gene networks. Transcriptomics data were collected and labeled on P. gingivalis protein-coding mRNA sequence and pseudogene, lincRNA, and bidirectional promoter lincRNA. Orange, a machine learning tool, analyzed and predicted data after preprocessing. Naïve Bayes, neural networks, and gradient descent partition data into training and testing sets, yielding accurate results. Cross-validation, model accuracy, and ROC curve were evaluated after model validation.

Results: Three models, Neural Networks, Naive Bayes, and Gradient Boosting, were evaluated using metrics like Area Under the Curve (AUC), Classification Accuracy (CA), F1 Score, Precision, Recall, and Specificity. Gradient Boosting achieved a balanced performance (AUC: 0.72, CA: 0.41, F1: 0.32) compared to Neural Networks (AUC: 0.721, CA: 0.391, F1: 0.314) and Naive Bayes (AUC: 0.701, CA: 0.172, F1: 0.114). While statistical tests revealed no significant differences between the models, Gradient Boosting exhibited a more balanced precision-recall relationship.

Conclusion: In silico analysis using machine learning techniques successfully predicted protein-coding mRNA sequences within Porphyromonas gingivalis OMVs. Gradient Boosting outperformed other models (Neural Networks, Naive Bayes) by achieving a balanced performance across metrics like AUC, classification accuracy, and precision-recall, suggests its potential as a reliable tool for protein-coding mRNA prediction in P. gingivalis OMVs.

查看原文本刊更多论文

深度学习预测牙龈脓肿释放的外膜囊泡中的炎症诱导蛋白编码 mRNA。

目的：Insilco 研究使用深度学习算法预测编码蛋白质的 pg m RNA 序列：NCBI GEO DATA SET GSE218606的GEO R工具发现了P.G外膜囊泡中差异表达最大的mRNA。Genemania 分析了差异表达基因网络。转录组学数据被收集起来，并标注在牙龈炎蛋白编码 mRNA 序列和假基因、lincRNA 和双向启动子 lincRNA 上。机器学习工具 Orange 对预处理后的数据进行分析和预测。奈夫贝叶斯、神经网络和梯度下降法将数据分为训练集和测试集，从而得出准确的结果。在模型验证后，对交叉验证、模型准确性和 ROC 曲线进行了评估：使用曲线下面积（AUC）、分类准确率（CA）、F1 分数、精确度、召回率和特异性等指标对神经网络、奈夫贝叶斯和梯度提升这三种模型进行了评估。与神经网络（AUC：0.721，CA：0.391，F1：0.314）和 Naive Bayes（AUC：0.701，CA：0.172，F1：0.114）相比，梯度提升法取得了均衡的性能（AUC：0.72，CA：0.41，F1：0.32）。虽然统计测试显示模型之间没有明显差异，但梯度提升模型的精确度与召回率之间的关系更为平衡：结论：利用机器学习技术进行的硅学分析成功地预测了牙龈卟啉单胞菌 OMVs 中的蛋白编码 mRNA 序列。梯度提升法在AUC、分类准确率和精确度-召回率等指标上表现均衡，优于其他模型（神经网络、Naive Bayes），这表明它有潜力成为预测牙龈卟啉菌OMVs中蛋白编码mRNA的可靠工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biomedical Engineering and Computational Biology ENGINEERING, BIOMEDICAL-

自引率

0.00%

发文量

审稿时长

8 weeks