{"title":"DH_Aligner:具有AVX矢量化的多核平台上的快速短读对齐器","authors":"Qiao Sun , Feng Chen , Leisheng Li , Huiyuan Li","doi":"10.1016/j.jpdc.2025.105142","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid development of the NGS (Next-Generation Sequencing) technology leads to massive genome data produced at a much higher throughput than before, which leads to great demand for downstream fast and accurate genetic analysis. As one of the first steps of bio-informatical work-flow, read alignment makes an educated guess on where and how a read is mapped to a given reference sequence. In this paper, we propose DH_Aligner, a fast and accurate short read aligner designed and optimized for x86 multi-core platforms with <span>avx2/avx512</span> SIMD instruction sets. It is based on a three-phased aligning work-flow: seeding-filtering-extension and provides an end-to-end solution for read alignment from <span>Fastq</span> to <span>SAM</span> files. Due to a fast seeding scheme and a seed filtering procedure, DH_Aligner can avoid both of a time-consuming seeding phase and redundant workload of aligning reads at seemingly wrong locations. With the introduction of batched-processing methodology, parallelism is easily exploited at data-, instruction- and thread-level. The performance-critical kernels in DH_Aligner are implemented by both <span>avx2</span> and <span>avx512</span> intrinsics for a better performance and portability. On two typical x86 based platforms: Intel Xeon-6154 and Hygon C86-7285, DH_Aligner can produce a near-best accuracy/sensitivity while outperform state-of-the-art parallel implementations with average speedup: 7.8x, 3.4x, 2.8x-6.7x and 1.5x over bwa-mem, bwa-mem2, bowtie2 and minimap2 respectively.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105142"},"PeriodicalIF":4.0000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DH_Aligner: A fast short-read aligner on multicore platforms with AVX vectorization\",\"authors\":\"Qiao Sun , Feng Chen , Leisheng Li , Huiyuan Li\",\"doi\":\"10.1016/j.jpdc.2025.105142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rapid development of the NGS (Next-Generation Sequencing) technology leads to massive genome data produced at a much higher throughput than before, which leads to great demand for downstream fast and accurate genetic analysis. As one of the first steps of bio-informatical work-flow, read alignment makes an educated guess on where and how a read is mapped to a given reference sequence. In this paper, we propose DH_Aligner, a fast and accurate short read aligner designed and optimized for x86 multi-core platforms with <span>avx2/avx512</span> SIMD instruction sets. It is based on a three-phased aligning work-flow: seeding-filtering-extension and provides an end-to-end solution for read alignment from <span>Fastq</span> to <span>SAM</span> files. Due to a fast seeding scheme and a seed filtering procedure, DH_Aligner can avoid both of a time-consuming seeding phase and redundant workload of aligning reads at seemingly wrong locations. With the introduction of batched-processing methodology, parallelism is easily exploited at data-, instruction- and thread-level. The performance-critical kernels in DH_Aligner are implemented by both <span>avx2</span> and <span>avx512</span> intrinsics for a better performance and portability. On two typical x86 based platforms: Intel Xeon-6154 and Hygon C86-7285, DH_Aligner can produce a near-best accuracy/sensitivity while outperform state-of-the-art parallel implementations with average speedup: 7.8x, 3.4x, 2.8x-6.7x and 1.5x over bwa-mem, bwa-mem2, bowtie2 and minimap2 respectively.</div></div>\",\"PeriodicalId\":54775,\"journal\":{\"name\":\"Journal of Parallel and Distributed Computing\",\"volume\":\"205 \",\"pages\":\"Article 105142\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Parallel and Distributed Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0743731525001091\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731525001091","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
DH_Aligner: A fast short-read aligner on multicore platforms with AVX vectorization
The rapid development of the NGS (Next-Generation Sequencing) technology leads to massive genome data produced at a much higher throughput than before, which leads to great demand for downstream fast and accurate genetic analysis. As one of the first steps of bio-informatical work-flow, read alignment makes an educated guess on where and how a read is mapped to a given reference sequence. In this paper, we propose DH_Aligner, a fast and accurate short read aligner designed and optimized for x86 multi-core platforms with avx2/avx512 SIMD instruction sets. It is based on a three-phased aligning work-flow: seeding-filtering-extension and provides an end-to-end solution for read alignment from Fastq to SAM files. Due to a fast seeding scheme and a seed filtering procedure, DH_Aligner can avoid both of a time-consuming seeding phase and redundant workload of aligning reads at seemingly wrong locations. With the introduction of batched-processing methodology, parallelism is easily exploited at data-, instruction- and thread-level. The performance-critical kernels in DH_Aligner are implemented by both avx2 and avx512 intrinsics for a better performance and portability. On two typical x86 based platforms: Intel Xeon-6154 and Hygon C86-7285, DH_Aligner can produce a near-best accuracy/sensitivity while outperform state-of-the-art parallel implementations with average speedup: 7.8x, 3.4x, 2.8x-6.7x and 1.5x over bwa-mem, bwa-mem2, bowtie2 and minimap2 respectively.
期刊介绍:
This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing.
The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.