M. R. Amin, Alisa Yurovsky, Yingtao Tian, S. Skiena
{"title":"DeepAnnotator:基因组注释与深度学习","authors":"M. R. Amin, Alisa Yurovsky, Yingtao Tian, S. Skiena","doi":"10.1145/3233547.3233577","DOIUrl":null,"url":null,"abstract":"Genome annotation is the process of labeling DNA sequences of an organism with its biological features, and is one of the fundamental problems in Bioinformatics. Public annotation pipelines such as NCBI integrate a variety of algorithms and homology searches on public and private databases. However, they build on the information of varying consistency and quality, produced over the last two decades. We identified 12,415 errors in NCBI RNA gene annotations, demonstrating the need for improved annotation programs. We use Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) to demonstrate the potential of deep learning networks to annotate genome sequences, and evaluate different approaches on prokaryotic sequences from NCBI database. Particularly, we evaluate DNA $K-$mer embeddings and the application of RNNs for genome annotation. We show how to improve the performance of our deep networks by incorporating intermediate objectives and downstream algorithms to achieve better accuracy. Our method, called DeepAnnotator, achieves an F-score of ~94%, and establishes a generalized computational approach for genome annotation using deep learning. Our results are very encouraging as our method eliminates the requirement of hand crafted features and motivates further research in application of deep learning to full genome annotation. DeepAnnotator algorithms and models can be accessed in Github: \\urlhttps://github.com/ruhulsbu/DeepAnnotator.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"138 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"DeepAnnotator: Genome Annotation with Deep Learning\",\"authors\":\"M. R. Amin, Alisa Yurovsky, Yingtao Tian, S. Skiena\",\"doi\":\"10.1145/3233547.3233577\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genome annotation is the process of labeling DNA sequences of an organism with its biological features, and is one of the fundamental problems in Bioinformatics. Public annotation pipelines such as NCBI integrate a variety of algorithms and homology searches on public and private databases. However, they build on the information of varying consistency and quality, produced over the last two decades. We identified 12,415 errors in NCBI RNA gene annotations, demonstrating the need for improved annotation programs. We use Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) to demonstrate the potential of deep learning networks to annotate genome sequences, and evaluate different approaches on prokaryotic sequences from NCBI database. Particularly, we evaluate DNA $K-$mer embeddings and the application of RNNs for genome annotation. We show how to improve the performance of our deep networks by incorporating intermediate objectives and downstream algorithms to achieve better accuracy. Our method, called DeepAnnotator, achieves an F-score of ~94%, and establishes a generalized computational approach for genome annotation using deep learning. Our results are very encouraging as our method eliminates the requirement of hand crafted features and motivates further research in application of deep learning to full genome annotation. DeepAnnotator algorithms and models can be accessed in Github: \\\\urlhttps://github.com/ruhulsbu/DeepAnnotator.\",\"PeriodicalId\":131906,\"journal\":{\"name\":\"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics\",\"volume\":\"138 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3233547.3233577\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3233547.3233577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DeepAnnotator: Genome Annotation with Deep Learning
Genome annotation is the process of labeling DNA sequences of an organism with its biological features, and is one of the fundamental problems in Bioinformatics. Public annotation pipelines such as NCBI integrate a variety of algorithms and homology searches on public and private databases. However, they build on the information of varying consistency and quality, produced over the last two decades. We identified 12,415 errors in NCBI RNA gene annotations, demonstrating the need for improved annotation programs. We use Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) to demonstrate the potential of deep learning networks to annotate genome sequences, and evaluate different approaches on prokaryotic sequences from NCBI database. Particularly, we evaluate DNA $K-$mer embeddings and the application of RNNs for genome annotation. We show how to improve the performance of our deep networks by incorporating intermediate objectives and downstream algorithms to achieve better accuracy. Our method, called DeepAnnotator, achieves an F-score of ~94%, and establishes a generalized computational approach for genome annotation using deep learning. Our results are very encouraging as our method eliminates the requirement of hand crafted features and motivates further research in application of deep learning to full genome annotation. DeepAnnotator algorithms and models can be accessed in Github: \urlhttps://github.com/ruhulsbu/DeepAnnotator.