GYAN:利用gpu感知计算映射加速银河系生物信息学工具

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI:10.1109/IPDPSW52791.2021.00037

Gulsum Gudukbay, J. Gunasekaran, Yilin Feng, M. Kandemir, A. Nekrutenko, C. Das, P. Medvedev, B. Grüning, Nate Coraor, Nathan P Roach, E. Afgan

{"title":"GYAN:利用gpu感知计算映射加速银河系生物信息学工具","authors":"Gulsum Gudukbay, J. Gunasekaran, Yilin Feng, M. Kandemir, A. Nekrutenko, C. Das, P. Medvedev, B. Grüning, Nate Coraor, Nathan P Roach, E. Afgan","doi":"10.1109/IPDPSW52791.2021.00037","DOIUrl":null,"url":null,"abstract":"Galaxy is an open-source web-based framework that is widely used for performing computational analyses in diverse application domains, such as genome assembly, computational chemistry, ecology, and epigenetics, to name a few. The current Galaxy software framework runs on several high-performance computing platforms such as on-premise clusters, public data centers, and national lab supercomputers. These infrastructures also provide support for state-of-the-art accelerators like Graphical Processing Units (GPUs). When coupled with accelerator support, the tools executing in Galaxy can benefit from massive performance gains in terms of computation time, thereby allowing a more robust computational analysis environment for researchers. Despite tools having GPU capabilities, the current Galaxy framework does not support GPUs, and thus prevents tools from taking advantage of the performance benefits offered by GPUs. We present and experimentally evaluate GYAN, a GPU-aware computation mapping and orchestration functionality implemented in Galaxy that allows the Galaxy tools to be executed on a GPU-enabled cluster. GYAN has the capability of identifying GPU-supported tools and scheduling them on single or multiple GPU nodes based on the availability in the cluster. GYAN supports both native and containerized tool execution. We performed extensive evaluations of the implementation using popular bio-engineering tools to demonstrate the benefits of using GPU technologies. For example, the Racon consensus tool executes ~2× faster than the regular baseline CPU-only jobs, while the Bonito base calling tool shows ~50× speedup.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"GYAN: Accelerating Bioinformatics Tools in Galaxy with GPU-Aware Computation Mapping\",\"authors\":\"Gulsum Gudukbay, J. Gunasekaran, Yilin Feng, M. Kandemir, A. Nekrutenko, C. Das, P. Medvedev, B. Grüning, Nate Coraor, Nathan P Roach, E. Afgan\",\"doi\":\"10.1109/IPDPSW52791.2021.00037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Galaxy is an open-source web-based framework that is widely used for performing computational analyses in diverse application domains, such as genome assembly, computational chemistry, ecology, and epigenetics, to name a few. The current Galaxy software framework runs on several high-performance computing platforms such as on-premise clusters, public data centers, and national lab supercomputers. These infrastructures also provide support for state-of-the-art accelerators like Graphical Processing Units (GPUs). When coupled with accelerator support, the tools executing in Galaxy can benefit from massive performance gains in terms of computation time, thereby allowing a more robust computational analysis environment for researchers. Despite tools having GPU capabilities, the current Galaxy framework does not support GPUs, and thus prevents tools from taking advantage of the performance benefits offered by GPUs. We present and experimentally evaluate GYAN, a GPU-aware computation mapping and orchestration functionality implemented in Galaxy that allows the Galaxy tools to be executed on a GPU-enabled cluster. GYAN has the capability of identifying GPU-supported tools and scheduling them on single or multiple GPU nodes based on the availability in the cluster. GYAN supports both native and containerized tool execution. We performed extensive evaluations of the implementation using popular bio-engineering tools to demonstrate the benefits of using GPU technologies. For example, the Racon consensus tool executes ~2× faster than the regular baseline CPU-only jobs, while the Bonito base calling tool shows ~50× speedup.\",\"PeriodicalId\":170832,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"91 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW52791.2021.00037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

Galaxy是一个基于web的开源框架，广泛用于在不同的应用领域进行计算分析，例如基因组组装、计算化学、生态学和表观遗传学等。当前的Galaxy软件框架运行在多个高性能计算平台上，如本地集群、公共数据中心和国家实验室超级计算机。这些基础设施还为图形处理单元(gpu)等最先进的加速器提供支持。当与加速器支持相结合时，在Galaxy中执行的工具可以从计算时间方面的巨大性能提升中获益，从而为研究人员提供更强大的计算分析环境。尽管工具具有GPU功能，但当前的Galaxy框架不支持GPU，因此阻止了工具利用GPU提供的性能优势。我们提出并实验评估GYAN，这是一种在Galaxy中实现的gpu感知计算映射和编排功能，允许Galaxy工具在支持gpu的集群上执行。GYAN能够识别GPU支持的工具，并根据集群中的可用性在单个或多个GPU节点上调度它们。GYAN支持本机和容器化的工具执行。我们使用流行的生物工程工具对实现进行了广泛的评估，以展示使用GPU技术的好处。例如，Racon共识工具的执行速度比常规的仅使用cpu的基准作业快2倍，而Bonito基础调用工具的执行速度提高了50倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GYAN: Accelerating Bioinformatics Tools in Galaxy with GPU-Aware Computation Mapping

Galaxy is an open-source web-based framework that is widely used for performing computational analyses in diverse application domains, such as genome assembly, computational chemistry, ecology, and epigenetics, to name a few. The current Galaxy software framework runs on several high-performance computing platforms such as on-premise clusters, public data centers, and national lab supercomputers. These infrastructures also provide support for state-of-the-art accelerators like Graphical Processing Units (GPUs). When coupled with accelerator support, the tools executing in Galaxy can benefit from massive performance gains in terms of computation time, thereby allowing a more robust computational analysis environment for researchers. Despite tools having GPU capabilities, the current Galaxy framework does not support GPUs, and thus prevents tools from taking advantage of the performance benefits offered by GPUs. We present and experimentally evaluate GYAN, a GPU-aware computation mapping and orchestration functionality implemented in Galaxy that allows the Galaxy tools to be executed on a GPU-enabled cluster. GYAN has the capability of identifying GPU-supported tools and scheduling them on single or multiple GPU nodes based on the availability in the cluster. GYAN supports both native and containerized tool execution. We performed extensive evaluations of the implementation using popular bio-engineering tools to demonstrate the benefits of using GPU technologies. For example, the Racon consensus tool executes ~2× faster than the regular baseline CPU-only jobs, while the Bonito base calling tool shows ~50× speedup.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量