Time-Division Multiplexing for FPGA Considering CNN Model Switch Time

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI:10.1109/IPDPSW52791.2021.00074

Tetsuro Nakamura, S. Saito, Kei Fujimoto, M. Kaneko, A. Shiraga

{"title":"Time-Division Multiplexing for FPGA Considering CNN Model Switch Time","authors":"Tetsuro Nakamura, S. Saito, Kei Fujimoto, M. Kaneko, A. Shiraga","doi":"10.1109/IPDPSW52791.2021.00074","DOIUrl":null,"url":null,"abstract":"With the spread of real-time data analysis by artificial intelligence (Al), the use of accelerators in edge computing has been attracting attention due to their low power consumption and low latency. In this paper, we propose a system that further reduces the power consumption and cost by sharing an accelerator among multiple users while maintaining real-time performance. Four requirements are defined: high utilization of device, fair device usage among users, real-time performance, and resource abstraction. Targeting a use case of Al inference, we propose a system that can share a field-programmable gate array (FPGA) among multiple users while satisfying the requirements by switching convolutional neural network (CNN) models stored in the device memory on the FPGA. The system enables a time-division multiplexed accelerator with real-time performance and high device utilization by using a scheduling algorithm that considers the switch time of the CNN models. User fairness is also achieved by adopting ageing techniques in the scheduling algorithm, in which priority increases in accordance with job waiting time. In addition, a thread manager has been integrated to the system that absorbs the difference among CNN models to abstract underlying hardware resources. The system was implemented on an FPGA device and evaluated to be 24-94 % fairer and 31-33 % more resource efficient than the conventional system using first-come first-served and round robin algorithms.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"15 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

With the spread of real-time data analysis by artificial intelligence (Al), the use of accelerators in edge computing has been attracting attention due to their low power consumption and low latency. In this paper, we propose a system that further reduces the power consumption and cost by sharing an accelerator among multiple users while maintaining real-time performance. Four requirements are defined: high utilization of device, fair device usage among users, real-time performance, and resource abstraction. Targeting a use case of Al inference, we propose a system that can share a field-programmable gate array (FPGA) among multiple users while satisfying the requirements by switching convolutional neural network (CNN) models stored in the device memory on the FPGA. The system enables a time-division multiplexed accelerator with real-time performance and high device utilization by using a scheduling algorithm that considers the switch time of the CNN models. User fairness is also achieved by adopting ageing techniques in the scheduling algorithm, in which priority increases in accordance with job waiting time. In addition, a thread manager has been integrated to the system that absorbs the difference among CNN models to abstract underlying hardware resources. The system was implemented on an FPGA device and evaluated to be 24-94 % fairer and 31-33 % more resource efficient than the conventional system using first-come first-served and round robin algorithms.

查看原文本刊更多论文

考虑CNN模型切换时间的FPGA时分复用

随着人工智能(ai)实时数据分析的普及，在边缘计算中使用加速器因其低功耗和低延迟而备受关注。在本文中，我们提出了一个系统，通过在多个用户之间共享加速器，同时保持实时性能，进一步降低功耗和成本。定义了四个要求:设备的高利用率、用户间设备的公平使用、实时性能和资源抽象。针对人工智能推理用例，我们提出了一种系统，该系统可以在多个用户之间共享一个现场可编程门阵列(FPGA)，同时通过交换存储在FPGA上的设备内存中的卷积神经网络(CNN)模型来满足需求。该系统采用考虑CNN模型切换时间的调度算法，实现了具有实时性和高设备利用率的时分复用加速器。调度算法采用老化技术，优先级随作业等待时间的增加而增加，实现了用户公平性。此外，系统还集成了一个线程管理器，可以吸收CNN模型之间的差异，抽象底层硬件资源。该系统在FPGA器件上实现，与采用先到先得和轮循算法的传统系统相比，其公平性提高了24- 94%，资源效率提高了31- 33%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量