Behnam Khaleghi, Tianqi Zhang, C. Martino, George Armstrong, Ameen Akel, Ken Curewitz, Justin Eno, S. Eilert, Rob Knight, Niema Moshiri, Tajana Rosing
{"title":"重点:基于fpga的超快速短读对齐","authors":"Behnam Khaleghi, Tianqi Zhang, C. Martino, George Armstrong, Ameen Akel, Ken Curewitz, Justin Eno, S. Eilert, Rob Knight, Niema Moshiri, Tajana Rosing","doi":"10.1109/ICFPT56656.2022.9974548","DOIUrl":null,"url":null,"abstract":"State-of-the-art high-throughput DNA sequencers output terabytes of short reads that typically need to be aligned to a reference genome in order to perform downstream analyses. Because alignment typically dominates the total run time of bioinformatics pipelines, a number of recent work sought to accelerate it in hardware. However, existing FPGA implemen-tations did not fully optimize the alignment algorithms for the FPGA hardware and mainly focused on a subset of alignment problems, e.g., ungapped alignment with a limited number of mismatches, which hinder their practical utility. In this work, we analyze the existing alignment methods and identify and leverage opportunities for FPGA acceleration. Our alignment framework, SALIENT, first carries out an ultra-fast ungapped alignment, which supports a flexible number of mismatches. Based on the underlying bioinformatics pipeline and the information provided by the ungapped aligner, SALIENT then identifies a fraction of reads that need to go through its gapped aligner, thus improving alignment throughput. We extensively evaluate SALIENT using diverse datasets. Experimental results indicate that SALIENT, running on a single Xilinx Alveo U280 device, delivers an average throughput of 546 million bases/second, outperforming the state- of-the-art minimap2 software by 40x, and Bowtie2 by up to 107 x, with a similar or slightly better (~O.l %-0.5 %) alignment and error (false negative/positive) rate. Compared to the existing ungapped FPGA aligners [1]–[4], SALIENT has 9.4-18x higher throughput/Watt, while compared to the gapped aligners [5], [6], it is 28–35 x better. SALIENT achieves 7.6 x higher throughput than Illumina DRAGEN Bio-IT Platform [7].","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SALIENT: Ultra-Fast FPGA-based Short Read Alignment\",\"authors\":\"Behnam Khaleghi, Tianqi Zhang, C. Martino, George Armstrong, Ameen Akel, Ken Curewitz, Justin Eno, S. Eilert, Rob Knight, Niema Moshiri, Tajana Rosing\",\"doi\":\"10.1109/ICFPT56656.2022.9974548\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"State-of-the-art high-throughput DNA sequencers output terabytes of short reads that typically need to be aligned to a reference genome in order to perform downstream analyses. Because alignment typically dominates the total run time of bioinformatics pipelines, a number of recent work sought to accelerate it in hardware. However, existing FPGA implemen-tations did not fully optimize the alignment algorithms for the FPGA hardware and mainly focused on a subset of alignment problems, e.g., ungapped alignment with a limited number of mismatches, which hinder their practical utility. In this work, we analyze the existing alignment methods and identify and leverage opportunities for FPGA acceleration. Our alignment framework, SALIENT, first carries out an ultra-fast ungapped alignment, which supports a flexible number of mismatches. Based on the underlying bioinformatics pipeline and the information provided by the ungapped aligner, SALIENT then identifies a fraction of reads that need to go through its gapped aligner, thus improving alignment throughput. We extensively evaluate SALIENT using diverse datasets. Experimental results indicate that SALIENT, running on a single Xilinx Alveo U280 device, delivers an average throughput of 546 million bases/second, outperforming the state- of-the-art minimap2 software by 40x, and Bowtie2 by up to 107 x, with a similar or slightly better (~O.l %-0.5 %) alignment and error (false negative/positive) rate. Compared to the existing ungapped FPGA aligners [1]–[4], SALIENT has 9.4-18x higher throughput/Watt, while compared to the gapped aligners [5], [6], it is 28–35 x better. SALIENT achieves 7.6 x higher throughput than Illumina DRAGEN Bio-IT Platform [7].\",\"PeriodicalId\":239314,\"journal\":{\"name\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT56656.2022.9974548\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT56656.2022.9974548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SALIENT: Ultra-Fast FPGA-based Short Read Alignment
State-of-the-art high-throughput DNA sequencers output terabytes of short reads that typically need to be aligned to a reference genome in order to perform downstream analyses. Because alignment typically dominates the total run time of bioinformatics pipelines, a number of recent work sought to accelerate it in hardware. However, existing FPGA implemen-tations did not fully optimize the alignment algorithms for the FPGA hardware and mainly focused on a subset of alignment problems, e.g., ungapped alignment with a limited number of mismatches, which hinder their practical utility. In this work, we analyze the existing alignment methods and identify and leverage opportunities for FPGA acceleration. Our alignment framework, SALIENT, first carries out an ultra-fast ungapped alignment, which supports a flexible number of mismatches. Based on the underlying bioinformatics pipeline and the information provided by the ungapped aligner, SALIENT then identifies a fraction of reads that need to go through its gapped aligner, thus improving alignment throughput. We extensively evaluate SALIENT using diverse datasets. Experimental results indicate that SALIENT, running on a single Xilinx Alveo U280 device, delivers an average throughput of 546 million bases/second, outperforming the state- of-the-art minimap2 software by 40x, and Bowtie2 by up to 107 x, with a similar or slightly better (~O.l %-0.5 %) alignment and error (false negative/positive) rate. Compared to the existing ungapped FPGA aligners [1]–[4], SALIENT has 9.4-18x higher throughput/Watt, while compared to the gapped aligners [5], [6], it is 28–35 x better. SALIENT achieves 7.6 x higher throughput than Illumina DRAGEN Bio-IT Platform [7].