An FPGA-Based Data Pre-Processing Architecture to Accelerate De-Novo Genome Assembly

2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE) Pub Date : 2021-10-25 DOI:10.1109/BIBE52308.2021.9635499

Georgios Galanos, Pavlos Malakonakis, A. Dollas

{"title":"An FPGA-Based Data Pre-Processing Architecture to Accelerate De-Novo Genome Assembly","authors":"Georgios Galanos, Pavlos Malakonakis, A. Dollas","doi":"10.1109/BIBE52308.2021.9635499","DOIUrl":null,"url":null,"abstract":"Genome assembly is a field of bioinformatics which refers to the process of taking small fragments of genetic material and putting them back together in order to reconstruct the original DNA sequence from which the fragments originated. As the DNA genome assembly input datasets in most cases have a very large amount of data, it is important to develop custom architectures in order to speed up these processes and gain significant execution time reduction. In this paper we present the Reads Matching Filter (RMF), an input dataset prefiltering process, based on string matching and implemented on Field Programmable Gate Array (FPGA) technology, in order to reduce the genome assembly execution time. The outputs of the RMF running on the FPGA as well as the original input dataset are given as input to the Velvet genome assembler which produces the assembly of the input sequences. The Velvet genome assembler is based on the manipulation of de Bruijn graphs, and produces its output via the removal of errors and the simplication of repeated regions. The FPGA-based RMF pre-filtering process manages to speedup the entire genome assembly processing, including I/O, by up to 6 times, while maintaining the quality of the output sequence contigs (i.e. the series of overlapping DNA sequences).","PeriodicalId":343724,"journal":{"name":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE52308.2021.9635499","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Genome assembly is a field of bioinformatics which refers to the process of taking small fragments of genetic material and putting them back together in order to reconstruct the original DNA sequence from which the fragments originated. As the DNA genome assembly input datasets in most cases have a very large amount of data, it is important to develop custom architectures in order to speed up these processes and gain significant execution time reduction. In this paper we present the Reads Matching Filter (RMF), an input dataset prefiltering process, based on string matching and implemented on Field Programmable Gate Array (FPGA) technology, in order to reduce the genome assembly execution time. The outputs of the RMF running on the FPGA as well as the original input dataset are given as input to the Velvet genome assembler which produces the assembly of the input sequences. The Velvet genome assembler is based on the manipulation of de Bruijn graphs, and produces its output via the removal of errors and the simplication of repeated regions. The FPGA-based RMF pre-filtering process manages to speedup the entire genome assembly processing, including I/O, by up to 6 times, while maintaining the quality of the output sequence contigs (i.e. the series of overlapping DNA sequences).

查看原文本刊更多论文

一种基于fpga的加速De-Novo基因组组装的数据预处理架构

基因组组装是生物信息学的一个领域，它是指将遗传物质的小片段重新组合在一起，以重建片段起源的原始DNA序列的过程。由于DNA基因组组装输入数据集在大多数情况下具有非常大的数据量，因此开发自定义架构以加快这些过程并显着减少执行时间非常重要。为了减少基因组组装的执行时间，本文提出了一种基于字符串匹配的输入数据集预滤波方法——Reads Matching Filter (RMF)，并在现场可编程门阵列(FPGA)上实现。在FPGA上运行的RMF的输出以及原始输入数据集作为天鹅绒基因组汇编器的输入，该汇编器产生输入序列的汇编。天鹅绒基因组组装器是基于德布鲁因图的操作，并通过去除错误和重复区域的简化来产生输出。基于fpga的RMF预滤波过程能够将整个基因组组装处理(包括I/O)的速度提高6倍，同时保持输出序列contigs(即一系列重叠的DNA序列)的质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)

自引率

0.00%

发文量