卷积神经网络的高效硬件加速

2019 32nd IEEE International System-on-Chip Conference (SOCC) Pub Date : 2019-09-01 DOI:10.1109/SOCC46988.2019.1570573948

S. Kala, B. R. Jose, J. Mathew, N. Sivanandan

{"title":"卷积神经网络的高效硬件加速","authors":"S. Kala, B. R. Jose, J. Mathew, N. Sivanandan","doi":"10.1109/SOCC46988.2019.1570573948","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) have emerged as the most efficient technique for solving a host of machine learning tasks. Compute and memory intensive nature of CNN has stimulated lot of work in hardware acceleration of these network models. FPGAs have emerged as a promising approach for accelerating CNNs, due to its high performance, flexibility and energy efficiency. We propose a unified architecture named UniWiG, where both Winograd based convolution and general matrix multiplication (GEMM) can be accelerated using the same set of processing elements. Proposed architecture has been used to accelerate AlexNet and VGG-16 models on FPGA with a performance of 433.63 GOPS and 407.23 GOPS respectively. We have also analyzed the performance with varying Winograd tile sizes and found out the most appropriate tile sizes for maximizing the performance while reducing on-chip memory resource.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Efficient Hardware Acceleration of Convolutional Neural Networks\",\"authors\":\"S. Kala, B. R. Jose, J. Mathew, N. Sivanandan\",\"doi\":\"10.1109/SOCC46988.2019.1570573948\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional neural networks (CNNs) have emerged as the most efficient technique for solving a host of machine learning tasks. Compute and memory intensive nature of CNN has stimulated lot of work in hardware acceleration of these network models. FPGAs have emerged as a promising approach for accelerating CNNs, due to its high performance, flexibility and energy efficiency. We propose a unified architecture named UniWiG, where both Winograd based convolution and general matrix multiplication (GEMM) can be accelerated using the same set of processing elements. Proposed architecture has been used to accelerate AlexNet and VGG-16 models on FPGA with a performance of 433.63 GOPS and 407.23 GOPS respectively. We have also analyzed the performance with varying Winograd tile sizes and found out the most appropriate tile sizes for maximizing the performance while reducing on-chip memory resource.\",\"PeriodicalId\":253998,\"journal\":{\"name\":\"2019 32nd IEEE International System-on-Chip Conference (SOCC)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 32nd IEEE International System-on-Chip Conference (SOCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SOCC46988.2019.1570573948\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCC46988.2019.1570573948","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

卷积神经网络(cnn)已经成为解决大量机器学习任务的最有效技术。CNN的计算和内存密集型特性刺激了这些网络模型在硬件加速方面的大量工作。fpga由于其高性能、灵活性和高能效而成为加速cnn的一种很有前途的方法。我们提出了一个名为UniWiG的统一架构，其中基于Winograd的卷积和通用矩阵乘法(GEMM)都可以使用相同的处理元素集来加速。该架构在FPGA上对AlexNet和VGG-16模型进行了加速，性能分别为433.63 GOPS和407.23 GOPS。我们还分析了不同Winograd瓦片大小的性能，并找到了在减少片上内存资源的同时最大化性能的最合适的瓦片大小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Hardware Acceleration of Convolutional Neural Networks

Convolutional neural networks (CNNs) have emerged as the most efficient technique for solving a host of machine learning tasks. Compute and memory intensive nature of CNN has stimulated lot of work in hardware acceleration of these network models. FPGAs have emerged as a promising approach for accelerating CNNs, due to its high performance, flexibility and energy efficiency. We propose a unified architecture named UniWiG, where both Winograd based convolution and general matrix multiplication (GEMM) can be accelerated using the same set of processing elements. Proposed architecture has been used to accelerate AlexNet and VGG-16 models on FPGA with a performance of 433.63 GOPS and 407.23 GOPS respectively. We have also analyzed the performance with varying Winograd tile sizes and found out the most appropriate tile sizes for maximizing the performance while reducing on-chip memory resource.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 32nd IEEE International System-on-Chip Conference (SOCC)

自引率

0.00%

发文量