An FPGA-Based Data Pre-Processing Architecture to Accelerate De-Novo Genome Assembly

Georgios Galanos, Pavlos Malakonakis, A. Dollas
{"title":"An FPGA-Based Data Pre-Processing Architecture to Accelerate De-Novo Genome Assembly","authors":"Georgios Galanos, Pavlos Malakonakis, A. Dollas","doi":"10.1109/BIBE52308.2021.9635499","DOIUrl":null,"url":null,"abstract":"Genome assembly is a field of bioinformatics which refers to the process of taking small fragments of genetic material and putting them back together in order to reconstruct the original DNA sequence from which the fragments originated. As the DNA genome assembly input datasets in most cases have a very large amount of data, it is important to develop custom architectures in order to speed up these processes and gain significant execution time reduction. In this paper we present the Reads Matching Filter (RMF), an input dataset prefiltering process, based on string matching and implemented on Field Programmable Gate Array (FPGA) technology, in order to reduce the genome assembly execution time. The outputs of the RMF running on the FPGA as well as the original input dataset are given as input to the Velvet genome assembler which produces the assembly of the input sequences. The Velvet genome assembler is based on the manipulation of de Bruijn graphs, and produces its output via the removal of errors and the simplication of repeated regions. The FPGA-based RMF pre-filtering process manages to speedup the entire genome assembly processing, including I/O, by up to 6 times, while maintaining the quality of the output sequence contigs (i.e. the series of overlapping DNA sequences).","PeriodicalId":343724,"journal":{"name":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE52308.2021.9635499","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Genome assembly is a field of bioinformatics which refers to the process of taking small fragments of genetic material and putting them back together in order to reconstruct the original DNA sequence from which the fragments originated. As the DNA genome assembly input datasets in most cases have a very large amount of data, it is important to develop custom architectures in order to speed up these processes and gain significant execution time reduction. In this paper we present the Reads Matching Filter (RMF), an input dataset prefiltering process, based on string matching and implemented on Field Programmable Gate Array (FPGA) technology, in order to reduce the genome assembly execution time. The outputs of the RMF running on the FPGA as well as the original input dataset are given as input to the Velvet genome assembler which produces the assembly of the input sequences. The Velvet genome assembler is based on the manipulation of de Bruijn graphs, and produces its output via the removal of errors and the simplication of repeated regions. The FPGA-based RMF pre-filtering process manages to speedup the entire genome assembly processing, including I/O, by up to 6 times, while maintaining the quality of the output sequence contigs (i.e. the series of overlapping DNA sequences).
一种基于fpga的加速De-Novo基因组组装的数据预处理架构
基因组组装是生物信息学的一个领域,它是指将遗传物质的小片段重新组合在一起,以重建片段起源的原始DNA序列的过程。由于DNA基因组组装输入数据集在大多数情况下具有非常大的数据量,因此开发自定义架构以加快这些过程并显着减少执行时间非常重要。为了减少基因组组装的执行时间,本文提出了一种基于字符串匹配的输入数据集预滤波方法——Reads Matching Filter (RMF),并在现场可编程门阵列(FPGA)上实现。在FPGA上运行的RMF的输出以及原始输入数据集作为天鹅绒基因组汇编器的输入,该汇编器产生输入序列的汇编。天鹅绒基因组组装器是基于德布鲁因图的操作,并通过去除错误和重复区域的简化来产生输出。基于fpga的RMF预滤波过程能够将整个基因组组装处理(包括I/O)的速度提高6倍,同时保持输出序列contigs(即一系列重叠的DNA序列)的质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信