Deng Pan, Cheng Zhong, Danyang Chen, Jinxiong Zhang, Feng Yang
{"title":"垂直分区数据并行加速超长读对齐","authors":"Deng Pan, Cheng Zhong, Danyang Chen, Jinxiong Zhang, Feng Yang","doi":"10.1109/PAAP56126.2022.10010526","DOIUrl":null,"url":null,"abstract":"The alignment between sequencing reads and genome is a basic work in biological big data analysis. Each read of the third generation sequencing data is getting longer, and the data size is getting larger. To effectively solve the ultra-long read alignment problem with high requirements for computing and memory capacity, a strategy for vertical partitioning ultra-long reads on hybrid CPU/GPU cluster is proposed, and a heap data structure is used to filter the local aligned results in all computing nodes of the parallel cluster system according to the alignment score to reduce the data transmission size. The methods for early termination and parallel merging-splicing are used to accelerate splicing local aligned results. The local aligned results among all computing nodes are collected and extended to obtain the final alignment results. The experimental results on datasets of simulated and real ultra-long reads show that the proposed parallel alignment algorithm can obtain high alignment accuracy, sensitivity and base-level sensitivity as a whole, and accelerate completing alignment between ultra-long reads and reference genome.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallel Accelerating Ultra-Long Read Alignment by Vertical Partitioning Data\",\"authors\":\"Deng Pan, Cheng Zhong, Danyang Chen, Jinxiong Zhang, Feng Yang\",\"doi\":\"10.1109/PAAP56126.2022.10010526\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The alignment between sequencing reads and genome is a basic work in biological big data analysis. Each read of the third generation sequencing data is getting longer, and the data size is getting larger. To effectively solve the ultra-long read alignment problem with high requirements for computing and memory capacity, a strategy for vertical partitioning ultra-long reads on hybrid CPU/GPU cluster is proposed, and a heap data structure is used to filter the local aligned results in all computing nodes of the parallel cluster system according to the alignment score to reduce the data transmission size. The methods for early termination and parallel merging-splicing are used to accelerate splicing local aligned results. The local aligned results among all computing nodes are collected and extended to obtain the final alignment results. The experimental results on datasets of simulated and real ultra-long reads show that the proposed parallel alignment algorithm can obtain high alignment accuracy, sensitivity and base-level sensitivity as a whole, and accelerate completing alignment between ultra-long reads and reference genome.\",\"PeriodicalId\":336339,\"journal\":{\"name\":\"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PAAP56126.2022.10010526\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PAAP56126.2022.10010526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parallel Accelerating Ultra-Long Read Alignment by Vertical Partitioning Data
The alignment between sequencing reads and genome is a basic work in biological big data analysis. Each read of the third generation sequencing data is getting longer, and the data size is getting larger. To effectively solve the ultra-long read alignment problem with high requirements for computing and memory capacity, a strategy for vertical partitioning ultra-long reads on hybrid CPU/GPU cluster is proposed, and a heap data structure is used to filter the local aligned results in all computing nodes of the parallel cluster system according to the alignment score to reduce the data transmission size. The methods for early termination and parallel merging-splicing are used to accelerate splicing local aligned results. The local aligned results among all computing nodes are collected and extended to obtain the final alignment results. The experimental results on datasets of simulated and real ultra-long reads show that the proposed parallel alignment algorithm can obtain high alignment accuracy, sensitivity and base-level sensitivity as a whole, and accelerate completing alignment between ultra-long reads and reference genome.