{"title":"Identifying protein subcellular location with embedding features learned from networks","authors":"Hongwei Liu, Bin Hu, Lei Chen, Lin Lu","doi":"10.2174/1570164617999201124142950","DOIUrl":null,"url":null,"abstract":"\n\nIdentification of protein subcellular location is an important problem because the subcellular location\nis highly related to protein function. It is fundamental to determine the locations with biology experiments. However,\nthese experiments are of high costs and time-consuming. The alternative way to address such problem is to design effective\ncomputational methods.\n\n\n\nTo date, several computational methods have been proposed in this regard. However, these methods mainly\nadopted the features derived from proteins themselves. On the other hand, with the development of network technique, several\nembedding algorithms have been proposed, which can encode nodes in the network into feature vectors. Such algorithms\nconnected the network and traditional classification algorithms. Thus, they provided a new way to construct models\nfor the prediction of protein subcellular location.\n\n\n\n In this study, we analyzed features produced by three network embedding algorithms (DeepWalk, Node2vec and\nMashup) that were applied on one or multiple protein networks. Obtained features were learned by one machine learning algorithm\n(support vector machine or random forest) to construct the model. The cross-validation method was adopted to\nevaluate all constructed models.\n\n\n\nAfter evaluating models with the cross-validation method, embedding features yielded by Mashup on multiple networks\nwere quite informative for predicting protein subcellular location. The model based on these features were superior to\nsome classic models.\n\n\n\n Embedding features yielded by a proper and powerful network embedding algorithm were effective for building\nthe model for prediction of protein subcellular location, providing new pipelines to build more efficient models.\n","PeriodicalId":50601,"journal":{"name":"Current Proteomics","volume":"27 1","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2020-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1570164617999201124142950","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 36
Abstract
Identification of protein subcellular location is an important problem because the subcellular location
is highly related to protein function. It is fundamental to determine the locations with biology experiments. However,
these experiments are of high costs and time-consuming. The alternative way to address such problem is to design effective
computational methods.
To date, several computational methods have been proposed in this regard. However, these methods mainly
adopted the features derived from proteins themselves. On the other hand, with the development of network technique, several
embedding algorithms have been proposed, which can encode nodes in the network into feature vectors. Such algorithms
connected the network and traditional classification algorithms. Thus, they provided a new way to construct models
for the prediction of protein subcellular location.
In this study, we analyzed features produced by three network embedding algorithms (DeepWalk, Node2vec and
Mashup) that were applied on one or multiple protein networks. Obtained features were learned by one machine learning algorithm
(support vector machine or random forest) to construct the model. The cross-validation method was adopted to
evaluate all constructed models.
After evaluating models with the cross-validation method, embedding features yielded by Mashup on multiple networks
were quite informative for predicting protein subcellular location. The model based on these features were superior to
some classic models.
Embedding features yielded by a proper and powerful network embedding algorithm were effective for building
the model for prediction of protein subcellular location, providing new pipelines to build more efficient models.
Current ProteomicsBIOCHEMICAL RESEARCH METHODS-BIOCHEMISTRY & MOLECULAR BIOLOGY
CiteScore
1.60
自引率
0.00%
发文量
25
审稿时长
>0 weeks
期刊介绍:
Research in the emerging field of proteomics is growing at an extremely rapid rate. The principal aim of Current Proteomics is to publish well-timed in-depth/mini review articles in this fast-expanding area on topics relevant and significant to the development of proteomics. Current Proteomics is an essential journal for everyone involved in proteomics and related fields in both academia and industry.
Current Proteomics publishes in-depth/mini review articles in all aspects of the fast-expanding field of proteomics. All areas of proteomics are covered together with the methodology, software, databases, technological advances and applications of proteomics, including functional proteomics. Diverse technologies covered include but are not limited to:
Protein separation and characterization techniques
2-D gel electrophoresis and image analysis
Techniques for protein expression profiling including mass spectrometry-based methods and algorithms for correlative database searching
Determination of co-translational and post- translational modification of proteins
Protein/peptide microarrays
Biomolecular interaction analysis
Analysis of protein complexes
Yeast two-hybrid projects
Protein-protein interaction (protein interactome) pathways and cell signaling networks
Systems biology
Proteome informatics (bioinformatics)
Knowledge integration and management tools
High-throughput protein structural studies (using mass spectrometry, nuclear magnetic resonance and X-ray crystallography)
High-throughput computational methods for protein 3-D structure as well as function determination
Robotics, nanotechnology, and microfluidics.