3M-AI: A Multi-task and Multi-core Virtualization Framework for Multi-FPGA AI Systems in the Cloud

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI:10.1145/3431920.3439480

Shulin Zeng, Guohao Dai, Hanbo Sun, Jun Liu, Hongren Zheng, Yusong Wu, Fan Zhang, Xinhao Yang, Yi Cai, Yu Wang, Huazhong Yang

{"title":"3M-AI: A Multi-task and Multi-core Virtualization Framework for Multi-FPGA AI Systems in the Cloud","authors":"Shulin Zeng, Guohao Dai, Hanbo Sun, Jun Liu, Hongren Zheng, Yusong Wu, Fan Zhang, Xinhao Yang, Yi Cai, Yu Wang, Huazhong Yang","doi":"10.1145/3431920.3439480","DOIUrl":null,"url":null,"abstract":"With the ever-growing demands for online Artificial Intelligence (AI), the hardware virtualization support for deep learning accelerators is vital for providing AI capability in the cloud. Three basic features, multi-task, dynamic workload, and remote access, are fundamental for hardware virtualization. However, most of the deep learning accelerators do not support concurrent execution of multiple tasks. Besides, the SOTA multi-DNN scheduling algorithm for NN accelerators neither consider the multi-task concurrent execution and resources allocation for the multi-core DNN accelerators. Moreover, existing GPU virtualized solutions could introduce a huge remote access latency overhead, resulting in a severe system performance drop. In order to tackle these challenges, we propose 3M-AI, a Multi-task and Multi-core virtualization framework for Multi-FPGA AI systems in the cloud. 3M-AI enables model parallelism on multi-FPGA by optimizing data synchronization and movement between FPGAs. 3M-AI exploits heuristic hardware resource allocation algorithm and accurate multi-core latency prediction model. 3M-AI significantly reduces the remote API access overhead to nearly 1%, and achieves better NN inference latency with a batch size 1 compared with GPU virtualization solutions.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431920.3439480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

With the ever-growing demands for online Artificial Intelligence (AI), the hardware virtualization support for deep learning accelerators is vital for providing AI capability in the cloud. Three basic features, multi-task, dynamic workload, and remote access, are fundamental for hardware virtualization. However, most of the deep learning accelerators do not support concurrent execution of multiple tasks. Besides, the SOTA multi-DNN scheduling algorithm for NN accelerators neither consider the multi-task concurrent execution and resources allocation for the multi-core DNN accelerators. Moreover, existing GPU virtualized solutions could introduce a huge remote access latency overhead, resulting in a severe system performance drop. In order to tackle these challenges, we propose 3M-AI, a Multi-task and Multi-core virtualization framework for Multi-FPGA AI systems in the cloud. 3M-AI enables model parallelism on multi-FPGA by optimizing data synchronization and movement between FPGAs. 3M-AI exploits heuristic hardware resource allocation algorithm and accurate multi-core latency prediction model. 3M-AI significantly reduces the remote API access overhead to nearly 1%, and achieves better NN inference latency with a batch size 1 compared with GPU virtualization solutions.

查看原文本刊更多论文

3M-AI:云环境下多fpga AI系统的多任务多核虚拟化框架

多任务、动态工作负载和远程访问三个基本特性是硬件虚拟化的基础。此外，用于神经网络加速器的SOTA多DNN调度算法没有考虑多核DNN加速器的多任务并发执行和资源分配问题。此外，现有的GPU虚拟化解决方案可能会引入巨大的远程访问延迟开销，导致系统性能严重下降。为了应对这些挑战，我们提出了3M-AI，这是一种用于云中的多fpga AI系统的多任务多核虚拟化框架。3M-AI通过优化fpga之间的数据同步和移动，实现多fpga上的模型并行性。3M-AI利用启发式硬件资源分配算法和精确的多核延迟预测模型。与GPU虚拟化解决方案相比，3M-AI将远程API访问开销显著降低到近1%，并且在批处理大小为1的情况下实现了更好的NN推理延迟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量