MOCHA:基于加速器的异构云多节点成本优化

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI:10.1145/3431920.3439304

Peipei Zhou, Jiayi Sheng, Cody Hao Yu, Peng Wei, Jie Wang, Di Wu, J. Cong

{"title":"MOCHA:基于加速器的异构云多节点成本优化","authors":"Peipei Zhou, Jiayi Sheng, Cody Hao Yu, Peng Wei, Jie Wang, Di Wu, J. Cong","doi":"10.1145/3431920.3439304","DOIUrl":null,"url":null,"abstract":"FPGAs have been widely deployed in public clouds, e.g., Amazon Web Services (AWS) and Huawei Cloud. However, simply offloading accelerated kernels from CPU hosts to PCIe-based FPGAs does not guarantee out-of-pocket cost savings in a pay-as-you-go public cloud. Taking Genome Analysis Toolkit (GATK) applications as case studies, although the adoption of FPGAs reduces the overall execution time, it introduces 2.56× extra cost, due to insufficient application-level speedup by Amdahl's law. To optimize the out-of-pocket cost while keeping high speedup and throughput, we propose Mocha framework as a distributed runtime system to fully utilize the accelerator resource by accelerator sharing and CPU-FPGA partial task offloading. Evaluation results on Haplotype Caller (HTC) and Mutect2 in GATK show that on AWS, Mocha saves on the application cost by 2.82x for HTC, 1.06x for Mutect2 and on Huawei Cloud by 1.22x, 1.52x respectively than straightforward CPU-FPGA integration solution with less than 5.1% performance overhead.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"MOCHA: Multinode Cost Optimization in Heterogeneous Clouds with Accelerators\",\"authors\":\"Peipei Zhou, Jiayi Sheng, Cody Hao Yu, Peng Wei, Jie Wang, Di Wu, J. Cong\",\"doi\":\"10.1145/3431920.3439304\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"FPGAs have been widely deployed in public clouds, e.g., Amazon Web Services (AWS) and Huawei Cloud. However, simply offloading accelerated kernels from CPU hosts to PCIe-based FPGAs does not guarantee out-of-pocket cost savings in a pay-as-you-go public cloud. Taking Genome Analysis Toolkit (GATK) applications as case studies, although the adoption of FPGAs reduces the overall execution time, it introduces 2.56× extra cost, due to insufficient application-level speedup by Amdahl's law. To optimize the out-of-pocket cost while keeping high speedup and throughput, we propose Mocha framework as a distributed runtime system to fully utilize the accelerator resource by accelerator sharing and CPU-FPGA partial task offloading. Evaluation results on Haplotype Caller (HTC) and Mutect2 in GATK show that on AWS, Mocha saves on the application cost by 2.82x for HTC, 1.06x for Mutect2 and on Huawei Cloud by 1.22x, 1.52x respectively than straightforward CPU-FPGA integration solution with less than 5.1% performance overhead.\",\"PeriodicalId\":386071,\"journal\":{\"name\":\"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3431920.3439304\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431920.3439304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

fpga已经广泛部署在公共云中，例如亚马逊网络服务(AWS)和华为云。然而，简单地将加速内核从CPU主机卸载到基于pcie的fpga并不能保证在即用即付的公共云中节省自付成本。以Genome Analysis Toolkit (GATK)应用程序为例进行研究，尽管采用fpga减少了总体执行时间，但由于Amdahl定律的应用级加速不足，它引入了2.56倍的额外成本。为了优化自付成本，同时保持较高的加速和吞吐量，我们提出了Mocha框架作为分布式运行时系统，通过加速器共享和CPU-FPGA部分任务卸载来充分利用加速器资源。在GATK中的Haplotype Caller (HTC)和Mutect2上的评估结果表明，在AWS上，Mocha比直接的CPU-FPGA集成解决方案分别节省2.82倍的应用成本，1.06倍的Mutect2, 1.22倍的1.52倍，性能开销低于5.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MOCHA: Multinode Cost Optimization in Heterogeneous Clouds with Accelerators

FPGAs have been widely deployed in public clouds, e.g., Amazon Web Services (AWS) and Huawei Cloud. However, simply offloading accelerated kernels from CPU hosts to PCIe-based FPGAs does not guarantee out-of-pocket cost savings in a pay-as-you-go public cloud. Taking Genome Analysis Toolkit (GATK) applications as case studies, although the adoption of FPGAs reduces the overall execution time, it introduces 2.56× extra cost, due to insufficient application-level speedup by Amdahl's law. To optimize the out-of-pocket cost while keeping high speedup and throughput, we propose Mocha framework as a distributed runtime system to fully utilize the accelerator resource by accelerator sharing and CPU-FPGA partial task offloading. Evaluation results on Haplotype Caller (HTC) and Mutect2 in GATK show that on AWS, Mocha saves on the application cost by 2.82x for HTC, 1.06x for Mutect2 and on Huawei Cloud by 1.22x, 1.52x respectively than straightforward CPU-FPGA integration solution with less than 5.1% performance overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量