不同尺寸卷积层的自适应硬件加速器

Zhao Yiwei, Zou Hao, Tang Ming, Lin Qiutong
{"title":"不同尺寸卷积层的自适应硬件加速器","authors":"Zhao Yiwei, Zou Hao, Tang Ming, Lin Qiutong","doi":"10.1109/ICCWAMTIP56608.2022.10016562","DOIUrl":null,"url":null,"abstract":"Convolution is the most important operation in convolutional neural networks (CNN). FPGA-based CNN accelerators need to fully consider the optimization of convolution loops to get ideal performance. This work analyzes convolution loop optimization in detail, exploiting loop tiling, loop unrolling, and loop interchange to design the dataflow of accelerator. This work quantitatively evaluates strategies for data reuse and resource utilization, combining fixed and dynamic parallelism to design a high-performance adaptive accelerator. The proposed accelerator is evaluated on ZCU102 FPGA by implementing a five-layer CNN with large differences in convolution layer sizes. It achieves more than 1.14x improvement in throughput efficiency over prior accelerators. And the consumption of logic resources is less than half of prior accelerators while the computing resources are similar.","PeriodicalId":159508,"journal":{"name":"2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Adaptive Hardware Accelerator For Convolution Layers With Diverse Sizes\",\"authors\":\"Zhao Yiwei, Zou Hao, Tang Ming, Lin Qiutong\",\"doi\":\"10.1109/ICCWAMTIP56608.2022.10016562\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolution is the most important operation in convolutional neural networks (CNN). FPGA-based CNN accelerators need to fully consider the optimization of convolution loops to get ideal performance. This work analyzes convolution loop optimization in detail, exploiting loop tiling, loop unrolling, and loop interchange to design the dataflow of accelerator. This work quantitatively evaluates strategies for data reuse and resource utilization, combining fixed and dynamic parallelism to design a high-performance adaptive accelerator. The proposed accelerator is evaluated on ZCU102 FPGA by implementing a five-layer CNN with large differences in convolution layer sizes. It achieves more than 1.14x improvement in throughput efficiency over prior accelerators. And the consumption of logic resources is less than half of prior accelerators while the computing resources are similar.\",\"PeriodicalId\":159508,\"journal\":{\"name\":\"2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCWAMTIP56608.2022.10016562\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWAMTIP56608.2022.10016562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

卷积是卷积神经网络(CNN)中最重要的运算。基于fpga的CNN加速器需要充分考虑卷积回路的优化,以获得理想的性能。本文详细分析了卷积循环优化,利用循环平铺、循环展开和循环交换来设计加速器的数据流。本文定量评估了数据重用和资源利用策略,结合固定并行和动态并行设计了高性能自适应加速器。通过在ZCU102 FPGA上实现卷积层大小差异较大的五层CNN,对所提出的加速器进行了评估。与之前的加速器相比,它的吞吐量效率提高了1.14倍以上。在计算资源相当的情况下,逻辑资源的消耗不到现有加速器的一半。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Adaptive Hardware Accelerator For Convolution Layers With Diverse Sizes
Convolution is the most important operation in convolutional neural networks (CNN). FPGA-based CNN accelerators need to fully consider the optimization of convolution loops to get ideal performance. This work analyzes convolution loop optimization in detail, exploiting loop tiling, loop unrolling, and loop interchange to design the dataflow of accelerator. This work quantitatively evaluates strategies for data reuse and resource utilization, combining fixed and dynamic parallelism to design a high-performance adaptive accelerator. The proposed accelerator is evaluated on ZCU102 FPGA by implementing a five-layer CNN with large differences in convolution layer sizes. It achieves more than 1.14x improvement in throughput efficiency over prior accelerators. And the consumption of logic resources is less than half of prior accelerators while the computing resources are similar.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信