An hardware accelerator design of Mobile-Net model on FPGA

Proceedings of the Second International Conference on AI-ML Systems Pub Date : 2022-10-12 DOI:10.1145/3564121.3564124

Sanjaya M V, M. Rao

{"title":"An hardware accelerator design of Mobile-Net model on FPGA","authors":"Sanjaya M V, M. Rao","doi":"10.1145/3564121.3564124","DOIUrl":null,"url":null,"abstract":"Domain specific hardware architectures and hardware accelerators have been a vital part of modern system design. Especially for math intensive applications involving tasks related to machine perception, incorporating hardware accelerators that work in tandem with general purpose micro-processors can prove to be energy efficient both at server and edge scenarios. FPGAs, due to their reconfigurability makes it possible to have customized hardware designed as per the computational and memory requirements specific to that application. This work proposes an optimized low latency hardware accelerator implementation of Mobile-net V2 CNN on an FPGA. This paper presents an implementation of Mobile-net-V2 inference on a Xilinx Ultrascale+ MPSOC platform incorporating solely half precision floating point arithmetic for both parameters and activations of the network. The proposed implementation is also optimized by merging all batch-norm layers with its preceding convolutional layers. For applications which cannot compromise on performance of the algorithm for execution speed and efficiency, an optimized floating point inference is proposed. The current implementation offers an overall performance improvement of at-least 20X with moderate resource utilization with minimal variance in inference latency, as compared to performing inference on the processor alone with almost no degradation in the model accuracy.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Conference on AI-ML Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3564121.3564124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Domain specific hardware architectures and hardware accelerators have been a vital part of modern system design. Especially for math intensive applications involving tasks related to machine perception, incorporating hardware accelerators that work in tandem with general purpose micro-processors can prove to be energy efficient both at server and edge scenarios. FPGAs, due to their reconfigurability makes it possible to have customized hardware designed as per the computational and memory requirements specific to that application. This work proposes an optimized low latency hardware accelerator implementation of Mobile-net V2 CNN on an FPGA. This paper presents an implementation of Mobile-net-V2 inference on a Xilinx Ultrascale+ MPSOC platform incorporating solely half precision floating point arithmetic for both parameters and activations of the network. The proposed implementation is also optimized by merging all batch-norm layers with its preceding convolutional layers. For applications which cannot compromise on performance of the algorithm for execution speed and efficiency, an optimized floating point inference is proposed. The current implementation offers an overall performance improvement of at-least 20X with moderate resource utilization with minimal variance in inference latency, as compared to performing inference on the processor alone with almost no degradation in the model accuracy.

查看原文本刊更多论文

基于FPGA的移动网络模型硬件加速器设计

领域专用硬件架构和硬件加速器已经成为现代系统设计的重要组成部分。特别是对于涉及与机器感知相关任务的数学密集型应用程序，将硬件加速器与通用微处理器结合起来，在服务器和边缘场景中都可以证明是节能的。fpga由于其可重构性，使得根据特定应用程序的计算和内存要求定制硬件成为可能。本文提出了一种在FPGA上优化的低延迟的Mobile-net V2 CNN硬件加速器实现。本文介绍了在Xilinx Ultrascale+ MPSOC平台上实现Mobile-net-V2推理的方法，该平台仅包含用于网络参数和激活的半精度浮点算法。提出的实现还通过将所有批规范层与之前的卷积层合并来优化。对于不能在执行速度和效率上牺牲算法性能的应用，提出了一种优化的浮点推理。目前的实现提供了至少20倍的总体性能改进，资源利用率适中，推理延迟变化最小，与单独在处理器上执行推理相比，模型精度几乎没有下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Second International Conference on AI-ML Systems

自引率

0.00%

发文量