基于深度学习的蛋白质亚细胞和分泌定位预测

H. Zidoum, M. Magdy
{"title":"基于深度学习的蛋白质亚细胞和分泌定位预测","authors":"H. Zidoum, M. Magdy","doi":"10.1109/ICCSE1.2018.8374220","DOIUrl":null,"url":null,"abstract":"Predicting the protein structure and discovering its function according to its location in the cell is crucial for understanding the cellular translocation process and has direct applications in drug discovery. Computational prediction of protein localization is alternative to the time consuming experimental counterpart approach. We use deep learning approach to enhance the prediction accuracy while reducing the time in predicting uncharacterized protein sequence localization site. Our approach is based on general biological features of the protein sequence, and compartment specific features to which we added the physico-chemical sequence features. We collected the protein sequences from UniProt1/SWISS-PROT, then we collected the features for each protein. We consider five locations in the dataset, namely cytoplasm (CP), inner membrane (IM), outer membrane (OM), periplasm (PE) and secreted (SEC). We choose the protein sequences to be at least 100 amino-acid-length and a maximum length of 1000 amino acids. Each location contains 500 protein sequences. We propose a deep learning prediction method for bacteria taxonomy that combines a one-versus-one and one-versus all models along with feature selec-tion using linear svm ranking, and deep auto-encoders to initialize the weights. The method achieves overall accuracy of 97.81% using 10- fold cross-validation on our data. Our approach outperforms the current state of the art computational methods in protein subcellular localization on the selected dataset.","PeriodicalId":383579,"journal":{"name":"2018 International Conference on Computing Sciences and Engineering (ICCSE)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Protein Subcellular and Secreted Localization Prediction Using Deep Learning\",\"authors\":\"H. Zidoum, M. Magdy\",\"doi\":\"10.1109/ICCSE1.2018.8374220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Predicting the protein structure and discovering its function according to its location in the cell is crucial for understanding the cellular translocation process and has direct applications in drug discovery. Computational prediction of protein localization is alternative to the time consuming experimental counterpart approach. We use deep learning approach to enhance the prediction accuracy while reducing the time in predicting uncharacterized protein sequence localization site. Our approach is based on general biological features of the protein sequence, and compartment specific features to which we added the physico-chemical sequence features. We collected the protein sequences from UniProt1/SWISS-PROT, then we collected the features for each protein. We consider five locations in the dataset, namely cytoplasm (CP), inner membrane (IM), outer membrane (OM), periplasm (PE) and secreted (SEC). We choose the protein sequences to be at least 100 amino-acid-length and a maximum length of 1000 amino acids. Each location contains 500 protein sequences. We propose a deep learning prediction method for bacteria taxonomy that combines a one-versus-one and one-versus all models along with feature selec-tion using linear svm ranking, and deep auto-encoders to initialize the weights. The method achieves overall accuracy of 97.81% using 10- fold cross-validation on our data. Our approach outperforms the current state of the art computational methods in protein subcellular localization on the selected dataset.\",\"PeriodicalId\":383579,\"journal\":{\"name\":\"2018 International Conference on Computing Sciences and Engineering (ICCSE)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Computing Sciences and Engineering (ICCSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSE1.2018.8374220\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Computing Sciences and Engineering (ICCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSE1.2018.8374220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

根据蛋白质在细胞中的位置预测其结构并发现其功能对于理解细胞易位过程至关重要,并且在药物开发中具有直接应用价值。计算预测的蛋白质定位是替代耗时的实验对应物方法。我们使用深度学习方法来提高预测精度,同时减少预测未表征的蛋白质序列定位位点的时间。我们的方法是基于蛋白质序列的一般生物学特征,以及我们添加了物理化学序列特征的室特异性特征。我们从UniProt1/SWISS-PROT中收集蛋白质序列,然后收集每个蛋白质的特征。我们考虑了数据集中的五个位置,即细胞质(CP)、内膜(IM)、外膜(OM)、周质(PE)和分泌物(SEC)。我们选择的蛋白质序列长度至少为100个氨基酸,最大长度为1000个氨基酸。每个位点包含500个蛋白质序列。我们提出了一种细菌分类的深度学习预测方法,该方法结合了一对一和一对全模型,以及使用线性支持向量机排序的特征选择,以及深度自编码器来初始化权重。通过10倍交叉验证,该方法的总体准确率达到97.81%。我们的方法在选定的数据集上优于当前最先进的蛋白质亚细胞定位计算方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Protein Subcellular and Secreted Localization Prediction Using Deep Learning
Predicting the protein structure and discovering its function according to its location in the cell is crucial for understanding the cellular translocation process and has direct applications in drug discovery. Computational prediction of protein localization is alternative to the time consuming experimental counterpart approach. We use deep learning approach to enhance the prediction accuracy while reducing the time in predicting uncharacterized protein sequence localization site. Our approach is based on general biological features of the protein sequence, and compartment specific features to which we added the physico-chemical sequence features. We collected the protein sequences from UniProt1/SWISS-PROT, then we collected the features for each protein. We consider five locations in the dataset, namely cytoplasm (CP), inner membrane (IM), outer membrane (OM), periplasm (PE) and secreted (SEC). We choose the protein sequences to be at least 100 amino-acid-length and a maximum length of 1000 amino acids. Each location contains 500 protein sequences. We propose a deep learning prediction method for bacteria taxonomy that combines a one-versus-one and one-versus all models along with feature selec-tion using linear svm ranking, and deep auto-encoders to initialize the weights. The method achieves overall accuracy of 97.81% using 10- fold cross-validation on our data. Our approach outperforms the current state of the art computational methods in protein subcellular localization on the selected dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信