垂直分区数据并行加速超长读对齐

2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) Pub Date : 2022-11-25 DOI:10.1109/PAAP56126.2022.10010526

Deng Pan, Cheng Zhong, Danyang Chen, Jinxiong Zhang, Feng Yang

{"title":"垂直分区数据并行加速超长读对齐","authors":"Deng Pan, Cheng Zhong, Danyang Chen, Jinxiong Zhang, Feng Yang","doi":"10.1109/PAAP56126.2022.10010526","DOIUrl":null,"url":null,"abstract":"The alignment between sequencing reads and genome is a basic work in biological big data analysis. Each read of the third generation sequencing data is getting longer, and the data size is getting larger. To effectively solve the ultra-long read alignment problem with high requirements for computing and memory capacity, a strategy for vertical partitioning ultra-long reads on hybrid CPU/GPU cluster is proposed, and a heap data structure is used to filter the local aligned results in all computing nodes of the parallel cluster system according to the alignment score to reduce the data transmission size. The methods for early termination and parallel merging-splicing are used to accelerate splicing local aligned results. The local aligned results among all computing nodes are collected and extended to obtain the final alignment results. The experimental results on datasets of simulated and real ultra-long reads show that the proposed parallel alignment algorithm can obtain high alignment accuracy, sensitivity and base-level sensitivity as a whole, and accelerate completing alignment between ultra-long reads and reference genome.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallel Accelerating Ultra-Long Read Alignment by Vertical Partitioning Data\",\"authors\":\"Deng Pan, Cheng Zhong, Danyang Chen, Jinxiong Zhang, Feng Yang\",\"doi\":\"10.1109/PAAP56126.2022.10010526\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The alignment between sequencing reads and genome is a basic work in biological big data analysis. Each read of the third generation sequencing data is getting longer, and the data size is getting larger. To effectively solve the ultra-long read alignment problem with high requirements for computing and memory capacity, a strategy for vertical partitioning ultra-long reads on hybrid CPU/GPU cluster is proposed, and a heap data structure is used to filter the local aligned results in all computing nodes of the parallel cluster system according to the alignment score to reduce the data transmission size. The methods for early termination and parallel merging-splicing are used to accelerate splicing local aligned results. The local aligned results among all computing nodes are collected and extended to obtain the final alignment results. The experimental results on datasets of simulated and real ultra-long reads show that the proposed parallel alignment algorithm can obtain high alignment accuracy, sensitivity and base-level sensitivity as a whole, and accelerate completing alignment between ultra-long reads and reference genome.\",\"PeriodicalId\":336339,\"journal\":{\"name\":\"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PAAP56126.2022.10010526\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PAAP56126.2022.10010526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

测序reads与基因组比对是生物大数据分析的基础工作。第三代测序数据的每次读取时间越来越长，数据量也越来越大。为了有效解决对计算和内存容量要求较高的超长读对齐问题，提出了在CPU/GPU混合集群上对超长读进行垂直分区的策略，并采用堆数据结构根据对齐分数对并行集群系统各计算节点的局部对齐结果进行过滤，以减小数据传输量。采用提前终止和并行拼接的方法，加快拼接局部对齐的速度。收集所有计算节点之间的局部对齐结果并进行扩展，得到最终对齐结果。在模拟和真实超长reads数据集上的实验结果表明，所提出的平行比对算法总体上具有较高的比对精度、灵敏度和碱基级灵敏度，能够加速完成超长reads与参考基因组的比对。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Parallel Accelerating Ultra-Long Read Alignment by Vertical Partitioning Data

The alignment between sequencing reads and genome is a basic work in biological big data analysis. Each read of the third generation sequencing data is getting longer, and the data size is getting larger. To effectively solve the ultra-long read alignment problem with high requirements for computing and memory capacity, a strategy for vertical partitioning ultra-long reads on hybrid CPU/GPU cluster is proposed, and a heap data structure is used to filter the local aligned results in all computing nodes of the parallel cluster system according to the alignment score to reduce the data transmission size. The methods for early termination and parallel merging-splicing are used to accelerate splicing local aligned results. The local aligned results among all computing nodes are collected and extended to obtain the final alignment results. The experimental results on datasets of simulated and real ultra-long reads show that the proposed parallel alignment algorithm can obtain high alignment accuracy, sensitivity and base-level sensitivity as a whole, and accelerate completing alignment between ultra-long reads and reference genome.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)

自引率

0.00%

发文量