Tetsuro Nakamura, S. Saito, Kei Fujimoto, M. Kaneko, A. Shiraga
{"title":"Time-Division Multiplexing for FPGA Considering CNN Model Switch Time","authors":"Tetsuro Nakamura, S. Saito, Kei Fujimoto, M. Kaneko, A. Shiraga","doi":"10.1109/IPDPSW52791.2021.00074","DOIUrl":null,"url":null,"abstract":"With the spread of real-time data analysis by artificial intelligence (Al), the use of accelerators in edge computing has been attracting attention due to their low power consumption and low latency. In this paper, we propose a system that further reduces the power consumption and cost by sharing an accelerator among multiple users while maintaining real-time performance. Four requirements are defined: high utilization of device, fair device usage among users, real-time performance, and resource abstraction. Targeting a use case of Al inference, we propose a system that can share a field-programmable gate array (FPGA) among multiple users while satisfying the requirements by switching convolutional neural network (CNN) models stored in the device memory on the FPGA. The system enables a time-division multiplexed accelerator with real-time performance and high device utilization by using a scheduling algorithm that considers the switch time of the CNN models. User fairness is also achieved by adopting ageing techniques in the scheduling algorithm, in which priority increases in accordance with job waiting time. In addition, a thread manager has been integrated to the system that absorbs the difference among CNN models to abstract underlying hardware resources. The system was implemented on an FPGA device and evaluated to be 24-94 % fairer and 31-33 % more resource efficient than the conventional system using first-come first-served and round robin algorithms.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"15 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
With the spread of real-time data analysis by artificial intelligence (Al), the use of accelerators in edge computing has been attracting attention due to their low power consumption and low latency. In this paper, we propose a system that further reduces the power consumption and cost by sharing an accelerator among multiple users while maintaining real-time performance. Four requirements are defined: high utilization of device, fair device usage among users, real-time performance, and resource abstraction. Targeting a use case of Al inference, we propose a system that can share a field-programmable gate array (FPGA) among multiple users while satisfying the requirements by switching convolutional neural network (CNN) models stored in the device memory on the FPGA. The system enables a time-division multiplexed accelerator with real-time performance and high device utilization by using a scheduling algorithm that considers the switch time of the CNN models. User fairness is also achieved by adopting ageing techniques in the scheduling algorithm, in which priority increases in accordance with job waiting time. In addition, a thread manager has been integrated to the system that absorbs the difference among CNN models to abstract underlying hardware resources. The system was implemented on an FPGA device and evaluated to be 24-94 % fairer and 31-33 % more resource efficient than the conventional system using first-come first-served and round robin algorithms.