对标加速下一代测序分析管道。

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-05-15 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf085
Pubudu Saneth Samarakoon, Ghislain Fournous, Lars T Hansen, Ashen Wijesiri, Sen Zhao, Rodriguez Alex A, Tarak Nath Nandi, Ravi Madduri, Alexander D Rowe, Gard Thomassen, Eivind Hovig, Sabry Razick
{"title":"对标加速下一代测序分析管道。","authors":"Pubudu Saneth Samarakoon, Ghislain Fournous, Lars T Hansen, Ashen Wijesiri, Sen Zhao, Rodriguez Alex A, Tarak Nath Nandi, Ravi Madduri, Alexander D Rowe, Gard Thomassen, Eivind Hovig, Sabry Razick","doi":"10.1093/bioadv/vbaf085","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Industry-standard central processing unit (CPU)-based next-generation sequencing (NGS) analysis tools have led to longer runtimes, affecting their utility in time-sensitive clinical practices and population-scale research studies. To address this, researchers have developed accelerated NGS platforms like DRAGEN and Parabricks, which have significantly reduced runtimes-from days to hours. However, these studies have evaluated accelerated platforms independently without sufficiently assessing computational resource usage or thoroughly investigating speedup scalability, a gap our study is designed to address.</p><p><strong>Results: </strong>Corroborating previous studies, accelerated pipelines demonstrated shorter runtimes than CPU-only approaches, with Parabricks-H100 demonstrating the highest speedups, followed by DRAGEN. In mapping, DRAGEN outperformed Parabricks (L4 and A100) and matched H100 speedups. Parabricks (A100 and H100) variant calling demonstrated higher speedups than DRAGEN. Moreover, DRAGEN and Parabricks-H100 mapping showed positive trends in the coverage-based scalability analysis, while other configurations failed to scale effectively. Our profiler analysis provided new insights into the relationships between Parabricks' performances and resource usage patterns, revealing its potential for further improvements. Our findings and cost comparison help researchers select accelerated platforms based on coverage needs, timeframes, and budget, while suggesting optimization strategies.</p><p><strong>Availability and implementation: </strong>Datasets are described in the 'Data availability' section. Our NGS pipelines are available at https://github.com/NAICNO/accelerated_genomics.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf085"},"PeriodicalIF":2.4000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12092081/pdf/","citationCount":"0","resultStr":"{\"title\":\"Benchmarking accelerated next-generation sequencing analysis pipelines.\",\"authors\":\"Pubudu Saneth Samarakoon, Ghislain Fournous, Lars T Hansen, Ashen Wijesiri, Sen Zhao, Rodriguez Alex A, Tarak Nath Nandi, Ravi Madduri, Alexander D Rowe, Gard Thomassen, Eivind Hovig, Sabry Razick\",\"doi\":\"10.1093/bioadv/vbaf085\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Industry-standard central processing unit (CPU)-based next-generation sequencing (NGS) analysis tools have led to longer runtimes, affecting their utility in time-sensitive clinical practices and population-scale research studies. To address this, researchers have developed accelerated NGS platforms like DRAGEN and Parabricks, which have significantly reduced runtimes-from days to hours. However, these studies have evaluated accelerated platforms independently without sufficiently assessing computational resource usage or thoroughly investigating speedup scalability, a gap our study is designed to address.</p><p><strong>Results: </strong>Corroborating previous studies, accelerated pipelines demonstrated shorter runtimes than CPU-only approaches, with Parabricks-H100 demonstrating the highest speedups, followed by DRAGEN. In mapping, DRAGEN outperformed Parabricks (L4 and A100) and matched H100 speedups. Parabricks (A100 and H100) variant calling demonstrated higher speedups than DRAGEN. Moreover, DRAGEN and Parabricks-H100 mapping showed positive trends in the coverage-based scalability analysis, while other configurations failed to scale effectively. Our profiler analysis provided new insights into the relationships between Parabricks' performances and resource usage patterns, revealing its potential for further improvements. Our findings and cost comparison help researchers select accelerated platforms based on coverage needs, timeframes, and budget, while suggesting optimization strategies.</p><p><strong>Availability and implementation: </strong>Datasets are described in the 'Data availability' section. Our NGS pipelines are available at https://github.com/NAICNO/accelerated_genomics.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf085\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12092081/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf085\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

动机:基于工业标准中央处理器(CPU)的下一代测序(NGS)分析工具导致了更长的运行时间,影响了它们在时间敏感的临床实践和人群规模研究中的实用性。为了解决这个问题,研究人员开发了加速NGS平台,如DRAGEN和Parabricks,这些平台大大缩短了运行时间,从几天缩短到几小时。然而,这些研究都是独立评估加速平台,而没有充分评估计算资源的使用或彻底调查加速的可扩展性,这是我们的研究旨在解决的一个差距。结果:与之前的研究相证实,加速管道的运行时间比仅使用cpu的方法要短,其中parabicks - h100的速度最高,其次是DRAGEN。在绘图方面,DRAGEN的表现优于parabbricks (L4和A100),并且速度与H100相当。parabbricks (A100和H100)变体调用显示出比DRAGEN更高的速度。此外,DRAGEN和parabbricks - h100映射在基于覆盖的可扩展性分析中显示出积极的趋势,而其他配置则无法有效扩展。我们的分析器分析为Parabricks的性能和资源使用模式之间的关系提供了新的见解,揭示了其进一步改进的潜力。我们的研究结果和成本比较有助于研究人员根据覆盖需求、时间框架和预算选择加速平台,同时提出优化策略。可用性和实现:数据集在“数据可用性”部分中进行了描述。我们的NGS管道可在https://github.com/NAICNO/accelerated_genomics上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Benchmarking accelerated next-generation sequencing analysis pipelines.

Motivation: Industry-standard central processing unit (CPU)-based next-generation sequencing (NGS) analysis tools have led to longer runtimes, affecting their utility in time-sensitive clinical practices and population-scale research studies. To address this, researchers have developed accelerated NGS platforms like DRAGEN and Parabricks, which have significantly reduced runtimes-from days to hours. However, these studies have evaluated accelerated platforms independently without sufficiently assessing computational resource usage or thoroughly investigating speedup scalability, a gap our study is designed to address.

Results: Corroborating previous studies, accelerated pipelines demonstrated shorter runtimes than CPU-only approaches, with Parabricks-H100 demonstrating the highest speedups, followed by DRAGEN. In mapping, DRAGEN outperformed Parabricks (L4 and A100) and matched H100 speedups. Parabricks (A100 and H100) variant calling demonstrated higher speedups than DRAGEN. Moreover, DRAGEN and Parabricks-H100 mapping showed positive trends in the coverage-based scalability analysis, while other configurations failed to scale effectively. Our profiler analysis provided new insights into the relationships between Parabricks' performances and resource usage patterns, revealing its potential for further improvements. Our findings and cost comparison help researchers select accelerated platforms based on coverage needs, timeframes, and budget, while suggesting optimization strategies.

Availability and implementation: Datasets are described in the 'Data availability' section. Our NGS pipelines are available at https://github.com/NAICNO/accelerated_genomics.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信