基于Winograd的高性能卷积神经网络加速器架构

IF 1.4 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters Pub Date : 2025-01-08 DOI:10.1109/LCA.2025.3525970

Vardhana M;Rohan Pinto

{"title":"基于Winograd的高性能卷积神经网络加速器架构","authors":"Vardhana M;Rohan Pinto","doi":"10.1109/LCA.2025.3525970","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks are deployed mostly on GPUs or CPUs. However, due to the increasing complexity of architecture and growing performance requirements, these platforms may not be suitable for deploying inference engines. ASIC and FPGA implementations are appearing as superior alternatives to software-based solutions for achieving the required performance. In this article, an efficient architecture for accelerating convolution using the Winograd transform is proposed and implemented on FPGA. The proposed accelerator consumes 38% less resources as compared with conventional GEMM-based implementation. Analysis results indicate that our accelerator can achieve 3.5 TOP/s, 1.28 TOP/s, and 1.42 TOP/s for VGG16, ResNet18, and MobileNetV2 CNNs, respectively, at 250 MHz. The proposed accelerator demonstrates the best energy efficiency as compared with prior arts.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"21-24"},"PeriodicalIF":1.4000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-Performance Winograd Based Accelerator Architecture for Convolutional Neural Network\",\"authors\":\"Vardhana M;Rohan Pinto\",\"doi\":\"10.1109/LCA.2025.3525970\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks are deployed mostly on GPUs or CPUs. However, due to the increasing complexity of architecture and growing performance requirements, these platforms may not be suitable for deploying inference engines. ASIC and FPGA implementations are appearing as superior alternatives to software-based solutions for achieving the required performance. In this article, an efficient architecture for accelerating convolution using the Winograd transform is proposed and implemented on FPGA. The proposed accelerator consumes 38% less resources as compared with conventional GEMM-based implementation. Analysis results indicate that our accelerator can achieve 3.5 TOP/s, 1.28 TOP/s, and 1.42 TOP/s for VGG16, ResNet18, and MobileNetV2 CNNs, respectively, at 250 MHz. The proposed accelerator demonstrates the best energy efficiency as compared with prior arts.\",\"PeriodicalId\":51248,\"journal\":{\"name\":\"IEEE Computer Architecture Letters\",\"volume\":\"24 1\",\"pages\":\"21-24\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Computer Architecture Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10833703/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Architecture Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10833703/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络主要部署在gpu或cpu上。然而，由于体系结构的复杂性和性能需求的增长，这些平台可能不适合部署推理引擎。ASIC和FPGA实现正在成为实现所需性能的基于软件的解决方案的优越替代方案。本文提出了一种利用Winograd变换加速卷积的高效架构，并在FPGA上实现。与传统的基于gem的实现相比，所提出的加速器消耗的资源减少了38%。分析结果表明，在250 MHz下，我们的加速器对VGG16、ResNet18和MobileNetV2的cnn分别可以达到3.5、1.28和1.42 TOP/s。与现有技术相比，所提出的加速器显示出最佳的能源效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High-Performance Winograd Based Accelerator Architecture for Convolutional Neural Network

Convolutional Neural Networks are deployed mostly on GPUs or CPUs. However, due to the increasing complexity of architecture and growing performance requirements, these platforms may not be suitable for deploying inference engines. ASIC and FPGA implementations are appearing as superior alternatives to software-based solutions for achieving the required performance. In this article, an efficient architecture for accelerating convolution using the Winograd transform is proposed and implemented on FPGA. The proposed accelerator consumes 38% less resources as compared with conventional GEMM-based implementation. Analysis results indicate that our accelerator can achieve 3.5 TOP/s, 1.28 TOP/s, and 1.42 TOP/s for VGG16, ResNet18, and MobileNetV2 CNNs, respectively, at 250 MHz. The proposed accelerator demonstrates the best energy efficiency as compared with prior arts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Computer Architecture Letters COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.60

自引率

4.30%

发文量

期刊介绍： IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.