{"title":"基于深度学习的蛋白质亚细胞和分泌定位预测","authors":"H. Zidoum, M. Magdy","doi":"10.1109/ICCSE1.2018.8374220","DOIUrl":null,"url":null,"abstract":"Predicting the protein structure and discovering its function according to its location in the cell is crucial for understanding the cellular translocation process and has direct applications in drug discovery. Computational prediction of protein localization is alternative to the time consuming experimental counterpart approach. We use deep learning approach to enhance the prediction accuracy while reducing the time in predicting uncharacterized protein sequence localization site. Our approach is based on general biological features of the protein sequence, and compartment specific features to which we added the physico-chemical sequence features. We collected the protein sequences from UniProt1/SWISS-PROT, then we collected the features for each protein. We consider five locations in the dataset, namely cytoplasm (CP), inner membrane (IM), outer membrane (OM), periplasm (PE) and secreted (SEC). We choose the protein sequences to be at least 100 amino-acid-length and a maximum length of 1000 amino acids. Each location contains 500 protein sequences. We propose a deep learning prediction method for bacteria taxonomy that combines a one-versus-one and one-versus all models along with feature selec-tion using linear svm ranking, and deep auto-encoders to initialize the weights. The method achieves overall accuracy of 97.81% using 10- fold cross-validation on our data. Our approach outperforms the current state of the art computational methods in protein subcellular localization on the selected dataset.","PeriodicalId":383579,"journal":{"name":"2018 International Conference on Computing Sciences and Engineering (ICCSE)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Protein Subcellular and Secreted Localization Prediction Using Deep Learning\",\"authors\":\"H. Zidoum, M. Magdy\",\"doi\":\"10.1109/ICCSE1.2018.8374220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Predicting the protein structure and discovering its function according to its location in the cell is crucial for understanding the cellular translocation process and has direct applications in drug discovery. Computational prediction of protein localization is alternative to the time consuming experimental counterpart approach. We use deep learning approach to enhance the prediction accuracy while reducing the time in predicting uncharacterized protein sequence localization site. Our approach is based on general biological features of the protein sequence, and compartment specific features to which we added the physico-chemical sequence features. We collected the protein sequences from UniProt1/SWISS-PROT, then we collected the features for each protein. We consider five locations in the dataset, namely cytoplasm (CP), inner membrane (IM), outer membrane (OM), periplasm (PE) and secreted (SEC). We choose the protein sequences to be at least 100 amino-acid-length and a maximum length of 1000 amino acids. Each location contains 500 protein sequences. We propose a deep learning prediction method for bacteria taxonomy that combines a one-versus-one and one-versus all models along with feature selec-tion using linear svm ranking, and deep auto-encoders to initialize the weights. The method achieves overall accuracy of 97.81% using 10- fold cross-validation on our data. Our approach outperforms the current state of the art computational methods in protein subcellular localization on the selected dataset.\",\"PeriodicalId\":383579,\"journal\":{\"name\":\"2018 International Conference on Computing Sciences and Engineering (ICCSE)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Computing Sciences and Engineering (ICCSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSE1.2018.8374220\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Computing Sciences and Engineering (ICCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSE1.2018.8374220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Protein Subcellular and Secreted Localization Prediction Using Deep Learning
Predicting the protein structure and discovering its function according to its location in the cell is crucial for understanding the cellular translocation process and has direct applications in drug discovery. Computational prediction of protein localization is alternative to the time consuming experimental counterpart approach. We use deep learning approach to enhance the prediction accuracy while reducing the time in predicting uncharacterized protein sequence localization site. Our approach is based on general biological features of the protein sequence, and compartment specific features to which we added the physico-chemical sequence features. We collected the protein sequences from UniProt1/SWISS-PROT, then we collected the features for each protein. We consider five locations in the dataset, namely cytoplasm (CP), inner membrane (IM), outer membrane (OM), periplasm (PE) and secreted (SEC). We choose the protein sequences to be at least 100 amino-acid-length and a maximum length of 1000 amino acids. Each location contains 500 protein sequences. We propose a deep learning prediction method for bacteria taxonomy that combines a one-versus-one and one-versus all models along with feature selec-tion using linear svm ranking, and deep auto-encoders to initialize the weights. The method achieves overall accuracy of 97.81% using 10- fold cross-validation on our data. Our approach outperforms the current state of the art computational methods in protein subcellular localization on the selected dataset.