GRASP2:快速和内存高效的基因中心组装和同源搜索

Cuncong Zhong, Youngik Yang, Shibu Yooseph
{"title":"GRASP2:快速和内存高效的基因中心组装和同源搜索","authors":"Cuncong Zhong, Youngik Yang, Shibu Yooseph","doi":"10.1109/ICCABS.2017.8114296","DOIUrl":null,"url":null,"abstract":"A crucial task for metagenomic analysis is to annotate the function and taxonomy of the sequencing reads generated from a microbiome sample. In general, the reads can either be assembled into contigs and searched against reference databases, or individually searched without assembly. The first approach may suffer due to the fragmentary and incomplete nature of nucleotide sequence assembly, while the second approach is hampered by the reduced functional signal that a short read can contain. To tackle these issues, we previously developed GRASP (Guided Reference-based Assembly of Short Peptides), which accepts a reference protein sequence as input and aims to assemble its homologs from a database containing fragmentary protein sequences. In addition to a gene-centric assembly tool, GRASP also serves as a homolog search tool when using the assembled protein sequences as templates to recruit reads. GrASP has significantly improved sensitivity (60–80% vs. 30–40%) compared to other homolog search tools such as BLAST. However, GRASP is time- and space-consuming compared to these tools, and is not scalable to large datasets. Subsequently, we developed GRASPx which is 30X faster than GRASP. Here, we present a completely redesigned algorithm, GRASP2, for this computational problem. GRASP2 utilizes Burrow-Wheeler Transformation (BWT) to assist with assembly graph generation, and reduces the search space by employing a fast ungapped alignment strategy to reduce unnecessary traversal of non-homologous paths in the assembly graph. GRASP2 is 8-fold faster than GRASPx (and 250-fold faster than GRASP) and uses 8-fold less memory while maintaining the original high sensitivity of GRASP, which makes GRASP2 a useful tool for metagenomics data analysis. GRASP2 is implemented in C++ and is freely available from http://www.sourceforge.net/projects/grasp2.","PeriodicalId":89933,"journal":{"name":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","volume":"41 1","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"GRASP2: Fast and memory-efficient gene-centric assembly and homolog search\",\"authors\":\"Cuncong Zhong, Youngik Yang, Shibu Yooseph\",\"doi\":\"10.1109/ICCABS.2017.8114296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A crucial task for metagenomic analysis is to annotate the function and taxonomy of the sequencing reads generated from a microbiome sample. In general, the reads can either be assembled into contigs and searched against reference databases, or individually searched without assembly. The first approach may suffer due to the fragmentary and incomplete nature of nucleotide sequence assembly, while the second approach is hampered by the reduced functional signal that a short read can contain. To tackle these issues, we previously developed GRASP (Guided Reference-based Assembly of Short Peptides), which accepts a reference protein sequence as input and aims to assemble its homologs from a database containing fragmentary protein sequences. In addition to a gene-centric assembly tool, GRASP also serves as a homolog search tool when using the assembled protein sequences as templates to recruit reads. GrASP has significantly improved sensitivity (60–80% vs. 30–40%) compared to other homolog search tools such as BLAST. However, GRASP is time- and space-consuming compared to these tools, and is not scalable to large datasets. Subsequently, we developed GRASPx which is 30X faster than GRASP. Here, we present a completely redesigned algorithm, GRASP2, for this computational problem. GRASP2 utilizes Burrow-Wheeler Transformation (BWT) to assist with assembly graph generation, and reduces the search space by employing a fast ungapped alignment strategy to reduce unnecessary traversal of non-homologous paths in the assembly graph. GRASP2 is 8-fold faster than GRASPx (and 250-fold faster than GRASP) and uses 8-fold less memory while maintaining the original high sensitivity of GRASP, which makes GRASP2 a useful tool for metagenomics data analysis. GRASP2 is implemented in C++ and is freely available from http://www.sourceforge.net/projects/grasp2.\",\"PeriodicalId\":89933,\"journal\":{\"name\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"volume\":\"41 1\",\"pages\":\"1\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCABS.2017.8114296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCABS.2017.8114296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

宏基因组分析的一个关键任务是对微生物组样本产生的测序reads的功能和分类进行注释。通常,可以将读取的数据组装成contigs并根据参考数据库进行搜索,也可以不进行组装单独搜索。第一种方法可能由于核苷酸序列组装的片段性和不完全性而受到影响,而第二种方法则受到短读段可能包含的功能信号减少的阻碍。为了解决这些问题,我们之前开发了GRASP (Guided reference -based Assembly of Short Peptides),它接受参考蛋白序列作为输入,旨在从包含片段蛋白序列的数据库中组装其同源物。除了以基因为中心的组装工具,当使用组装的蛋白质序列作为模板招募reads时,GRASP还可以作为同源物搜索工具。与BLAST等其他同源搜索工具相比,GrASP的灵敏度显著提高(60-80% vs 30-40%)。然而,与这些工具相比,GRASP耗费时间和空间,并且不能扩展到大型数据集。随后,我们开发了比GRASP快30倍的GRASPx。在这里,我们提出了一个完全重新设计的算法,GRASP2,来解决这个计算问题。GRASP2利用Burrow-Wheeler变换(BWT)来辅助装配图的生成,并通过采用快速无间隙对齐策略来减少装配图中非同源路径的不必要遍历来减少搜索空间。GRASP2比GRASPx快8倍(比GRASP快250倍),使用的内存少8倍,同时保持了GRASP原有的高灵敏度,这使得GRASP2成为宏基因组数据分析的有用工具。GRASP2是用c++实现的,可以从http://www.sourceforge.net/projects/grasp2免费获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GRASP2: Fast and memory-efficient gene-centric assembly and homolog search
A crucial task for metagenomic analysis is to annotate the function and taxonomy of the sequencing reads generated from a microbiome sample. In general, the reads can either be assembled into contigs and searched against reference databases, or individually searched without assembly. The first approach may suffer due to the fragmentary and incomplete nature of nucleotide sequence assembly, while the second approach is hampered by the reduced functional signal that a short read can contain. To tackle these issues, we previously developed GRASP (Guided Reference-based Assembly of Short Peptides), which accepts a reference protein sequence as input and aims to assemble its homologs from a database containing fragmentary protein sequences. In addition to a gene-centric assembly tool, GRASP also serves as a homolog search tool when using the assembled protein sequences as templates to recruit reads. GrASP has significantly improved sensitivity (60–80% vs. 30–40%) compared to other homolog search tools such as BLAST. However, GRASP is time- and space-consuming compared to these tools, and is not scalable to large datasets. Subsequently, we developed GRASPx which is 30X faster than GRASP. Here, we present a completely redesigned algorithm, GRASP2, for this computational problem. GRASP2 utilizes Burrow-Wheeler Transformation (BWT) to assist with assembly graph generation, and reduces the search space by employing a fast ungapped alignment strategy to reduce unnecessary traversal of non-homologous paths in the assembly graph. GRASP2 is 8-fold faster than GRASPx (and 250-fold faster than GRASP) and uses 8-fold less memory while maintaining the original high sensitivity of GRASP, which makes GRASP2 a useful tool for metagenomics data analysis. GRASP2 is implemented in C++ and is freely available from http://www.sourceforge.net/projects/grasp2.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信