重点:基于fpga的超快速短读对齐

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI:10.1109/ICFPT56656.2022.9974548

Behnam Khaleghi, Tianqi Zhang, C. Martino, George Armstrong, Ameen Akel, Ken Curewitz, Justin Eno, S. Eilert, Rob Knight, Niema Moshiri, Tajana Rosing

{"title":"重点:基于fpga的超快速短读对齐","authors":"Behnam Khaleghi, Tianqi Zhang, C. Martino, George Armstrong, Ameen Akel, Ken Curewitz, Justin Eno, S. Eilert, Rob Knight, Niema Moshiri, Tajana Rosing","doi":"10.1109/ICFPT56656.2022.9974548","DOIUrl":null,"url":null,"abstract":"State-of-the-art high-throughput DNA sequencers output terabytes of short reads that typically need to be aligned to a reference genome in order to perform downstream analyses. Because alignment typically dominates the total run time of bioinformatics pipelines, a number of recent work sought to accelerate it in hardware. However, existing FPGA implemen-tations did not fully optimize the alignment algorithms for the FPGA hardware and mainly focused on a subset of alignment problems, e.g., ungapped alignment with a limited number of mismatches, which hinder their practical utility. In this work, we analyze the existing alignment methods and identify and leverage opportunities for FPGA acceleration. Our alignment framework, SALIENT, first carries out an ultra-fast ungapped alignment, which supports a flexible number of mismatches. Based on the underlying bioinformatics pipeline and the information provided by the ungapped aligner, SALIENT then identifies a fraction of reads that need to go through its gapped aligner, thus improving alignment throughput. We extensively evaluate SALIENT using diverse datasets. Experimental results indicate that SALIENT, running on a single Xilinx Alveo U280 device, delivers an average throughput of 546 million bases/second, outperforming the state- of-the-art minimap2 software by 40x, and Bowtie2 by up to 107 x, with a similar or slightly better (~O.l %-0.5 %) alignment and error (false negative/positive) rate. Compared to the existing ungapped FPGA aligners [1]–[4], SALIENT has 9.4-18x higher throughput/Watt, while compared to the gapped aligners [5], [6], it is 28–35 x better. SALIENT achieves 7.6 x higher throughput than Illumina DRAGEN Bio-IT Platform [7].","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SALIENT: Ultra-Fast FPGA-based Short Read Alignment\",\"authors\":\"Behnam Khaleghi, Tianqi Zhang, C. Martino, George Armstrong, Ameen Akel, Ken Curewitz, Justin Eno, S. Eilert, Rob Knight, Niema Moshiri, Tajana Rosing\",\"doi\":\"10.1109/ICFPT56656.2022.9974548\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"State-of-the-art high-throughput DNA sequencers output terabytes of short reads that typically need to be aligned to a reference genome in order to perform downstream analyses. Because alignment typically dominates the total run time of bioinformatics pipelines, a number of recent work sought to accelerate it in hardware. However, existing FPGA implemen-tations did not fully optimize the alignment algorithms for the FPGA hardware and mainly focused on a subset of alignment problems, e.g., ungapped alignment with a limited number of mismatches, which hinder their practical utility. In this work, we analyze the existing alignment methods and identify and leverage opportunities for FPGA acceleration. Our alignment framework, SALIENT, first carries out an ultra-fast ungapped alignment, which supports a flexible number of mismatches. Based on the underlying bioinformatics pipeline and the information provided by the ungapped aligner, SALIENT then identifies a fraction of reads that need to go through its gapped aligner, thus improving alignment throughput. We extensively evaluate SALIENT using diverse datasets. Experimental results indicate that SALIENT, running on a single Xilinx Alveo U280 device, delivers an average throughput of 546 million bases/second, outperforming the state- of-the-art minimap2 software by 40x, and Bowtie2 by up to 107 x, with a similar or slightly better (~O.l %-0.5 %) alignment and error (false negative/positive) rate. Compared to the existing ungapped FPGA aligners [1]–[4], SALIENT has 9.4-18x higher throughput/Watt, while compared to the gapped aligners [5], [6], it is 28–35 x better. SALIENT achieves 7.6 x higher throughput than Illumina DRAGEN Bio-IT Platform [7].\",\"PeriodicalId\":239314,\"journal\":{\"name\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT56656.2022.9974548\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT56656.2022.9974548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

最先进的高通量DNA测序仪输出数tb的短读，通常需要与参考基因组对齐，以便进行下游分析。由于校准通常主导着生物信息学管道的总运行时间，因此最近的一些工作试图在硬件上加速校准。然而，现有的FPGA实现并没有完全优化FPGA硬件的对准算法，并且主要集中在对准问题的子集上，例如具有有限数量的不匹配的未间隙对准，这阻碍了它们的实际应用。在这项工作中，我们分析了现有的校准方法，并确定和利用FPGA加速的机会。我们的校准框架，突出，首先进行超快速的未间隙校准，它支持灵活数量的不匹配。基于潜在的生物信息学管道和ungap aligner提供的信息，然后，SALIENT识别出需要通过其gap aligner的一小部分reads，从而提高了比对吞吐量。我们广泛评估显著使用不同的数据集。实验结果表明，在单个Xilinx Alveo U280设备上运行的SALIENT平均吞吐量为5.46亿个碱基/秒，比最先进的minimap2软件高40倍，比Bowtie2软件高107倍，性能相似或略好(~ 0)。L %- 0.5%)对齐和错误率(假阴性/假阳性)。与现有的未缺口FPGA对准器[1]-[4]相比，突出具有9.4-18倍的吞吐量/瓦特，而与缺口对准器[5]，[6]相比，它是28-35倍。与Illumina DRAGEN Bio-IT平台相比，SALIENT的吞吐量提高了7.6倍[7]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SALIENT: Ultra-Fast FPGA-based Short Read Alignment

State-of-the-art high-throughput DNA sequencers output terabytes of short reads that typically need to be aligned to a reference genome in order to perform downstream analyses. Because alignment typically dominates the total run time of bioinformatics pipelines, a number of recent work sought to accelerate it in hardware. However, existing FPGA implemen-tations did not fully optimize the alignment algorithms for the FPGA hardware and mainly focused on a subset of alignment problems, e.g., ungapped alignment with a limited number of mismatches, which hinder their practical utility. In this work, we analyze the existing alignment methods and identify and leverage opportunities for FPGA acceleration. Our alignment framework, SALIENT, first carries out an ultra-fast ungapped alignment, which supports a flexible number of mismatches. Based on the underlying bioinformatics pipeline and the information provided by the ungapped aligner, SALIENT then identifies a fraction of reads that need to go through its gapped aligner, thus improving alignment throughput. We extensively evaluate SALIENT using diverse datasets. Experimental results indicate that SALIENT, running on a single Xilinx Alveo U280 device, delivers an average throughput of 546 million bases/second, outperforming the state- of-the-art minimap2 software by 40x, and Bowtie2 by up to 107 x, with a similar or slightly better (~O.l %-0.5 %) alignment and error (false negative/positive) rate. Compared to the existing ungapped FPGA aligners [1]–[4], SALIENT has 9.4-18x higher throughput/Watt, while compared to the gapped aligners [5], [6], it is 28–35 x better. SALIENT achieves 7.6 x higher throughput than Illumina DRAGEN Bio-IT Platform [7].

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量