{"title":"DEEPGONET:基于级联卷积和递归网络的蛋白质序列GO注释多标签预测","authors":"S. M. S. Islam, M. Hasan","doi":"10.1109/ICCITECHN.2018.8631921","DOIUrl":null,"url":null,"abstract":"The present gap between the amount of available protein sequence due to the development of next generation sequencing technology (NGS) and slow and expensive experimental extraction of useful information, like annotation of protein sequence in different functional aspects, is ever widening. The gap can be reduced by employing automatic function prediction (AFP) approaches. Gene Ontology (GO), comprising of more than 40, 000 classes, defines three aspects of protein function named Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). The availability of multiple functions of a single protein has rendered the automatic function prediction a large-scale, multi-class, and a multi-label task. In this paper, we present DEEPGONET, a novel cascaded convolutional and recurrent neural network, to predict the top-level hierarchy of GO ontology. The network takes the primary sequence of protein as input, making it more useful than other prevailing state-of-the-art deep learning based methods with multi-modal input, which are less applicable for proteins where only primary sequence is available. All the predictions of different protein functions of our network are performed by the same architecture, a proof of better generalization as demonstrated by promising performance on a variety of organisms while trained on Homo sapiens only. The task has been made possible by efficient exploration of vast output space by leveraging hierarchical relationship among GO classes. The promising performance of our model makes it a potential avenue for directing experimental protein functions exploration efficiently by vastly eliminating possible routes which is done by the exploring only the suggested routes from our model. Our proposed model is also very simple and efficient in terms of computational time and space compared to other architectures in literature.","PeriodicalId":355984,"journal":{"name":"2018 21st International Conference of Computer and Information Technology (ICCIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"DEEPGONET: Multi-Label Prediction of GO Annotation for Protein from Sequence Using Cascaded Convolutional and Recurrent Network\",\"authors\":\"S. M. S. Islam, M. Hasan\",\"doi\":\"10.1109/ICCITECHN.2018.8631921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The present gap between the amount of available protein sequence due to the development of next generation sequencing technology (NGS) and slow and expensive experimental extraction of useful information, like annotation of protein sequence in different functional aspects, is ever widening. The gap can be reduced by employing automatic function prediction (AFP) approaches. Gene Ontology (GO), comprising of more than 40, 000 classes, defines three aspects of protein function named Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). The availability of multiple functions of a single protein has rendered the automatic function prediction a large-scale, multi-class, and a multi-label task. In this paper, we present DEEPGONET, a novel cascaded convolutional and recurrent neural network, to predict the top-level hierarchy of GO ontology. The network takes the primary sequence of protein as input, making it more useful than other prevailing state-of-the-art deep learning based methods with multi-modal input, which are less applicable for proteins where only primary sequence is available. All the predictions of different protein functions of our network are performed by the same architecture, a proof of better generalization as demonstrated by promising performance on a variety of organisms while trained on Homo sapiens only. The task has been made possible by efficient exploration of vast output space by leveraging hierarchical relationship among GO classes. The promising performance of our model makes it a potential avenue for directing experimental protein functions exploration efficiently by vastly eliminating possible routes which is done by the exploring only the suggested routes from our model. Our proposed model is also very simple and efficient in terms of computational time and space compared to other architectures in literature.\",\"PeriodicalId\":355984,\"journal\":{\"name\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCITECHN.2018.8631921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st International Conference of Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2018.8631921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DEEPGONET: Multi-Label Prediction of GO Annotation for Protein from Sequence Using Cascaded Convolutional and Recurrent Network
The present gap between the amount of available protein sequence due to the development of next generation sequencing technology (NGS) and slow and expensive experimental extraction of useful information, like annotation of protein sequence in different functional aspects, is ever widening. The gap can be reduced by employing automatic function prediction (AFP) approaches. Gene Ontology (GO), comprising of more than 40, 000 classes, defines three aspects of protein function named Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). The availability of multiple functions of a single protein has rendered the automatic function prediction a large-scale, multi-class, and a multi-label task. In this paper, we present DEEPGONET, a novel cascaded convolutional and recurrent neural network, to predict the top-level hierarchy of GO ontology. The network takes the primary sequence of protein as input, making it more useful than other prevailing state-of-the-art deep learning based methods with multi-modal input, which are less applicable for proteins where only primary sequence is available. All the predictions of different protein functions of our network are performed by the same architecture, a proof of better generalization as demonstrated by promising performance on a variety of organisms while trained on Homo sapiens only. The task has been made possible by efficient exploration of vast output space by leveraging hierarchical relationship among GO classes. The promising performance of our model makes it a potential avenue for directing experimental protein functions exploration efficiently by vastly eliminating possible routes which is done by the exploring only the suggested routes from our model. Our proposed model is also very simple and efficient in terms of computational time and space compared to other architectures in literature.