{"title":"新的自动和有效的基因组注释工具","authors":"M. Borodovsky","doi":"10.1109/ICCABS.2017.8114287","DOIUrl":null,"url":null,"abstract":"Gene prediction and annotation plays central role in genomics. However, in spite of much attention, open problems still exist and stimulate searches for new algorithmic solutions in all categories of gene finding. Prokaryotic genes can be identified with higher average accuracy than eukaryotic ones. Nevertheless, the error rate is not negligible and largely species-specific. Our prokaryotic gene finder GeneMarkS, a self-training tool working in iterations, was used in many genome projects [1]. In the new version, GeneMarkS-2, we introduced a series of heuristic models for training initialization, classification of genomes with respect to gene start organization, as well as an adaptive process of model structure modification. We used multiple tests to assess accuracy of the new tool as well as several other current gene finders. A self-training tool for gene annotation in eukaryotic genomes GeneMark-ES, has been constantly updated and has been used in a number of genome projects conducted by the DOE Joint Genome Institute and the Broad Institute since 2007. This tool was recently extended to fully automated GeneMark-ET [2] integrating information on RNA-Seq reads mapped to the genome. Another extension, GeneMark-EP uses genomic footprints of homologous proteins. Both algorithms carry similar approaches for filtering out errors in algorithms of processing external evidence. The metagenomic gene finder, MetaGeneMark [3] has been employed in IMG/M at DOE Joint Genome Institute for metagenome annotation. This tool was further developed to call genes in fungal metagenomes. Finally, BRAKER1, a pipeline for unsupervised RNA-Seq based genome annotation combines advantages of GeneMark-ET and AUGUSTUS [4]. All the tools described above can be applied for analysis of newly assembled NGS genomes without any additional preparation steps.","PeriodicalId":89933,"journal":{"name":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","volume":"35 1","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"New automatic and effective tools for genome annotation\",\"authors\":\"M. Borodovsky\",\"doi\":\"10.1109/ICCABS.2017.8114287\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gene prediction and annotation plays central role in genomics. However, in spite of much attention, open problems still exist and stimulate searches for new algorithmic solutions in all categories of gene finding. Prokaryotic genes can be identified with higher average accuracy than eukaryotic ones. Nevertheless, the error rate is not negligible and largely species-specific. Our prokaryotic gene finder GeneMarkS, a self-training tool working in iterations, was used in many genome projects [1]. In the new version, GeneMarkS-2, we introduced a series of heuristic models for training initialization, classification of genomes with respect to gene start organization, as well as an adaptive process of model structure modification. We used multiple tests to assess accuracy of the new tool as well as several other current gene finders. A self-training tool for gene annotation in eukaryotic genomes GeneMark-ES, has been constantly updated and has been used in a number of genome projects conducted by the DOE Joint Genome Institute and the Broad Institute since 2007. This tool was recently extended to fully automated GeneMark-ET [2] integrating information on RNA-Seq reads mapped to the genome. Another extension, GeneMark-EP uses genomic footprints of homologous proteins. Both algorithms carry similar approaches for filtering out errors in algorithms of processing external evidence. The metagenomic gene finder, MetaGeneMark [3] has been employed in IMG/M at DOE Joint Genome Institute for metagenome annotation. This tool was further developed to call genes in fungal metagenomes. Finally, BRAKER1, a pipeline for unsupervised RNA-Seq based genome annotation combines advantages of GeneMark-ET and AUGUSTUS [4]. All the tools described above can be applied for analysis of newly assembled NGS genomes without any additional preparation steps.\",\"PeriodicalId\":89933,\"journal\":{\"name\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"volume\":\"35 1\",\"pages\":\"1\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCABS.2017.8114287\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCABS.2017.8114287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
New automatic and effective tools for genome annotation
Gene prediction and annotation plays central role in genomics. However, in spite of much attention, open problems still exist and stimulate searches for new algorithmic solutions in all categories of gene finding. Prokaryotic genes can be identified with higher average accuracy than eukaryotic ones. Nevertheless, the error rate is not negligible and largely species-specific. Our prokaryotic gene finder GeneMarkS, a self-training tool working in iterations, was used in many genome projects [1]. In the new version, GeneMarkS-2, we introduced a series of heuristic models for training initialization, classification of genomes with respect to gene start organization, as well as an adaptive process of model structure modification. We used multiple tests to assess accuracy of the new tool as well as several other current gene finders. A self-training tool for gene annotation in eukaryotic genomes GeneMark-ES, has been constantly updated and has been used in a number of genome projects conducted by the DOE Joint Genome Institute and the Broad Institute since 2007. This tool was recently extended to fully automated GeneMark-ET [2] integrating information on RNA-Seq reads mapped to the genome. Another extension, GeneMark-EP uses genomic footprints of homologous proteins. Both algorithms carry similar approaches for filtering out errors in algorithms of processing external evidence. The metagenomic gene finder, MetaGeneMark [3] has been employed in IMG/M at DOE Joint Genome Institute for metagenome annotation. This tool was further developed to call genes in fungal metagenomes. Finally, BRAKER1, a pipeline for unsupervised RNA-Seq based genome annotation combines advantages of GeneMark-ET and AUGUSTUS [4]. All the tools described above can be applied for analysis of newly assembled NGS genomes without any additional preparation steps.