基于蛋白-蛋白相互作用网络和基因表达数据识别必需蛋白的深度学习框架

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Pub Date : 2018-12-01 DOI:10.1109/BIBM.2018.8621551

Min Zeng, Min Li, Zhihui Fei, Fang-Xiang Wu, Yaohang Li, Yi Pan

{"title":"基于蛋白-蛋白相互作用网络和基因表达数据识别必需蛋白的深度学习框架","authors":"Min Zeng, Min Li, Zhihui Fei, Fang-Xiang Wu, Yaohang Li, Yi Pan","doi":"10.1109/BIBM.2018.8621551","DOIUrl":null,"url":null,"abstract":"Identifying essential proteins is of vital importance for disease study and drug design. A lot of topology-based and machine learning-based methods have been proposed to identify essential proteins. However, traditional topology-based methods only focus on explicitly described characteristics of network topology and are not expressive enough to capture the complexity of connectivity patterns observed in biological networks. In addition, identification of essential proteins is an imbalanced learning problem due to the fact that there are significantly more non-essential proteins than the essential ones. Few machine learning-based methods take the imbalanced nature into consideration. We propose a new deep learning framework, to tackle the above limitations. In our model, we make use of the node2vec technique to learn topological features from protein-protein interaction (PPI) network without manual feature selection. To overcome the problem of the imbalanced nature of dataset, we use a sampling method, which does not bias to the majority and minority classes in a training step and tend to make full use of all samples during the whole training process. To evaluate the performance of our model, we test it on S. cerevisiae dataset. Our results show that it greatly outperforms topology-based methods including DC, BC, CC, EC, NC, LAC, PeC and WDC. It also outperforms machine learning-based methods including support vector machine (SVM), decision tree, random forest and Adaboost.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A Deep Learning Framework for Identifying Essential Proteins Based on Protein-Protein Interaction Network and Gene Expression Data\",\"authors\":\"Min Zeng, Min Li, Zhihui Fei, Fang-Xiang Wu, Yaohang Li, Yi Pan\",\"doi\":\"10.1109/BIBM.2018.8621551\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying essential proteins is of vital importance for disease study and drug design. A lot of topology-based and machine learning-based methods have been proposed to identify essential proteins. However, traditional topology-based methods only focus on explicitly described characteristics of network topology and are not expressive enough to capture the complexity of connectivity patterns observed in biological networks. In addition, identification of essential proteins is an imbalanced learning problem due to the fact that there are significantly more non-essential proteins than the essential ones. Few machine learning-based methods take the imbalanced nature into consideration. We propose a new deep learning framework, to tackle the above limitations. In our model, we make use of the node2vec technique to learn topological features from protein-protein interaction (PPI) network without manual feature selection. To overcome the problem of the imbalanced nature of dataset, we use a sampling method, which does not bias to the majority and minority classes in a training step and tend to make full use of all samples during the whole training process. To evaluate the performance of our model, we test it on S. cerevisiae dataset. Our results show that it greatly outperforms topology-based methods including DC, BC, CC, EC, NC, LAC, PeC and WDC. It also outperforms machine learning-based methods including support vector machine (SVM), decision tree, random forest and Adaboost.\",\"PeriodicalId\":108667,\"journal\":{\"name\":\"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2018.8621551\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2018.8621551","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

鉴定必需蛋白对疾病研究和药物设计至关重要。已经提出了许多基于拓扑和基于机器学习的方法来识别必需蛋白质。然而，传统的基于拓扑的方法只关注明确描述的网络拓扑特征，不足以表达生物网络中观察到的连接模式的复杂性。此外，必需蛋白的识别是一个不平衡的学习问题，因为非必需蛋白明显多于必需蛋白。很少有基于机器学习的方法考虑到不平衡性。我们提出了一个新的深度学习框架，以解决上述限制。在我们的模型中，我们利用node2vec技术从蛋白质-蛋白质相互作用(PPI)网络中学习拓扑特征，而无需手动选择特征。为了克服数据集的不平衡性问题，我们使用了采样方法，该方法在训练步骤中不会偏向多数类和少数类，并且在整个训练过程中倾向于充分利用所有样本。为了评估我们的模型的性能，我们在S. cerevisiae数据集上进行了测试。我们的研究结果表明，它大大优于基于拓扑的方法，包括DC, BC, CC, EC, NC, LAC, PeC和WDC。它也优于基于机器学习的方法，包括支持向量机(SVM)、决策树、随机森林和Adaboost。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Deep Learning Framework for Identifying Essential Proteins Based on Protein-Protein Interaction Network and Gene Expression Data

Identifying essential proteins is of vital importance for disease study and drug design. A lot of topology-based and machine learning-based methods have been proposed to identify essential proteins. However, traditional topology-based methods only focus on explicitly described characteristics of network topology and are not expressive enough to capture the complexity of connectivity patterns observed in biological networks. In addition, identification of essential proteins is an imbalanced learning problem due to the fact that there are significantly more non-essential proteins than the essential ones. Few machine learning-based methods take the imbalanced nature into consideration. We propose a new deep learning framework, to tackle the above limitations. In our model, we make use of the node2vec technique to learn topological features from protein-protein interaction (PPI) network without manual feature selection. To overcome the problem of the imbalanced nature of dataset, we use a sampling method, which does not bias to the majority and minority classes in a training step and tend to make full use of all samples during the whole training process. To evaluate the performance of our model, we test it on S. cerevisiae dataset. Our results show that it greatly outperforms topology-based methods including DC, BC, CC, EC, NC, LAC, PeC and WDC. It also outperforms machine learning-based methods including support vector machine (SVM), decision tree, random forest and Adaboost.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

自引率

0.00%

发文量