{"title":"预测埃博拉病毒-人类蛋白质-蛋白质相互作用的监督学习方法。","authors":"Lopamudra Dey, Sanjay Chakraborty","doi":"10.1016/j.gene.2025.149228","DOIUrl":null,"url":null,"abstract":"<p><p>The goal of this research work is to predict protein-protein interactions (PPIs) between the Ebola virus and the host who is at risk of infection. Since there are very limited databases available on the Ebola virus; we have prepared a comprehensive database of all the PPIs between the Ebola virus and human proteins (EbolaInt). Our work focuses on the finding of some new protein-protein interactions between humans and the Ebola virus using some state- of-the-arts machine learning techniques. However, it is basically a two-class problem with a positive interacting dataset and a negative non-interacting dataset. These datasets contain various sequence-based human protein features such as structure of amino acid and conjoint triad and domain-related features. In this research, we have briefly discussed and used some well-known supervised learning approaches to predict PPIs between human proteins and Ebola virus proteins, including K-nearest neighbours (KNN), random forest (RF), support vector machine (SVM), and deep feed-forward multi-layer perceptron (DMLP) etc. We have validated our prediction results using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Our goal with this prediction is to compare all other models' accuracy, precision, recall, and f1-score for predicting these PPIs. In the result section, DMLP is giving the highest accuracy along with the prediction of 2655 potential human target proteins.</p>","PeriodicalId":12499,"journal":{"name":"Gene","volume":"942 ","pages":"149228"},"PeriodicalIF":2.6000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Supervised learning approaches for predicting Ebola-Human Protein-Protein interactions.\",\"authors\":\"Lopamudra Dey, Sanjay Chakraborty\",\"doi\":\"10.1016/j.gene.2025.149228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The goal of this research work is to predict protein-protein interactions (PPIs) between the Ebola virus and the host who is at risk of infection. Since there are very limited databases available on the Ebola virus; we have prepared a comprehensive database of all the PPIs between the Ebola virus and human proteins (EbolaInt). Our work focuses on the finding of some new protein-protein interactions between humans and the Ebola virus using some state- of-the-arts machine learning techniques. However, it is basically a two-class problem with a positive interacting dataset and a negative non-interacting dataset. These datasets contain various sequence-based human protein features such as structure of amino acid and conjoint triad and domain-related features. In this research, we have briefly discussed and used some well-known supervised learning approaches to predict PPIs between human proteins and Ebola virus proteins, including K-nearest neighbours (KNN), random forest (RF), support vector machine (SVM), and deep feed-forward multi-layer perceptron (DMLP) etc. We have validated our prediction results using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Our goal with this prediction is to compare all other models' accuracy, precision, recall, and f1-score for predicting these PPIs. In the result section, DMLP is giving the highest accuracy along with the prediction of 2655 potential human target proteins.</p>\",\"PeriodicalId\":12499,\"journal\":{\"name\":\"Gene\",\"volume\":\"942 \",\"pages\":\"149228\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Gene\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.gene.2025.149228\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gene","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.gene.2025.149228","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/17 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Supervised learning approaches for predicting Ebola-Human Protein-Protein interactions.
The goal of this research work is to predict protein-protein interactions (PPIs) between the Ebola virus and the host who is at risk of infection. Since there are very limited databases available on the Ebola virus; we have prepared a comprehensive database of all the PPIs between the Ebola virus and human proteins (EbolaInt). Our work focuses on the finding of some new protein-protein interactions between humans and the Ebola virus using some state- of-the-arts machine learning techniques. However, it is basically a two-class problem with a positive interacting dataset and a negative non-interacting dataset. These datasets contain various sequence-based human protein features such as structure of amino acid and conjoint triad and domain-related features. In this research, we have briefly discussed and used some well-known supervised learning approaches to predict PPIs between human proteins and Ebola virus proteins, including K-nearest neighbours (KNN), random forest (RF), support vector machine (SVM), and deep feed-forward multi-layer perceptron (DMLP) etc. We have validated our prediction results using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Our goal with this prediction is to compare all other models' accuracy, precision, recall, and f1-score for predicting these PPIs. In the result section, DMLP is giving the highest accuracy along with the prediction of 2655 potential human target proteins.
期刊介绍:
Gene publishes papers that focus on the regulation, expression, function and evolution of genes in all biological contexts, including all prokaryotic and eukaryotic organisms, as well as viruses.