{"title":"Protein Function Prediction: Combining Statistical Features with Deep Learning","authors":"Deepa Kumari, Ashish Ranjan, A. Deepak","doi":"10.2139/ssrn.3349575","DOIUrl":null,"url":null,"abstract":"Functional annotation of proteins to reduce gap between the available proteins and their known functional annotations based on protein sequences is a challenging task. This requires transformation of protein sequences into feature vectors for efficient analysis from computational perspective using machine learning algorithms. However, such transformation is difficult task due to high diversity among the protein sequences from the same family. Most existing sequence features performed low when annotating proteins with large number of functional classes. In this paper, three sequence features are combined with deep learning techniques for better performance. Evaluation scores show better results when combined with deep CNN. F1-score for PseAAC + CNN improves by a factor of +9.5% compared to PseAAC + DNN. The corresponding number for AAID + CNN and SGT + CNN is +3.22% and +2.33% respectively.","PeriodicalId":18731,"journal":{"name":"Materials Processing & Manufacturing eJournal","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Processing & Manufacturing eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3349575","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Functional annotation of proteins to reduce gap between the available proteins and their known functional annotations based on protein sequences is a challenging task. This requires transformation of protein sequences into feature vectors for efficient analysis from computational perspective using machine learning algorithms. However, such transformation is difficult task due to high diversity among the protein sequences from the same family. Most existing sequence features performed low when annotating proteins with large number of functional classes. In this paper, three sequence features are combined with deep learning techniques for better performance. Evaluation scores show better results when combined with deep CNN. F1-score for PseAAC + CNN improves by a factor of +9.5% compared to PseAAC + DNN. The corresponding number for AAID + CNN and SGT + CNN is +3.22% and +2.33% respectively.