不同尺寸卷积层的自适应硬件加速器

2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) Pub Date : 2022-12-16 DOI:10.1109/ICCWAMTIP56608.2022.10016562

Zhao Yiwei, Zou Hao, Tang Ming, Lin Qiutong

{"title":"不同尺寸卷积层的自适应硬件加速器","authors":"Zhao Yiwei, Zou Hao, Tang Ming, Lin Qiutong","doi":"10.1109/ICCWAMTIP56608.2022.10016562","DOIUrl":null,"url":null,"abstract":"Convolution is the most important operation in convolutional neural networks (CNN). FPGA-based CNN accelerators need to fully consider the optimization of convolution loops to get ideal performance. This work analyzes convolution loop optimization in detail, exploiting loop tiling, loop unrolling, and loop interchange to design the dataflow of accelerator. This work quantitatively evaluates strategies for data reuse and resource utilization, combining fixed and dynamic parallelism to design a high-performance adaptive accelerator. The proposed accelerator is evaluated on ZCU102 FPGA by implementing a five-layer CNN with large differences in convolution layer sizes. It achieves more than 1.14x improvement in throughput efficiency over prior accelerators. And the consumption of logic resources is less than half of prior accelerators while the computing resources are similar.","PeriodicalId":159508,"journal":{"name":"2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Adaptive Hardware Accelerator For Convolution Layers With Diverse Sizes\",\"authors\":\"Zhao Yiwei, Zou Hao, Tang Ming, Lin Qiutong\",\"doi\":\"10.1109/ICCWAMTIP56608.2022.10016562\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolution is the most important operation in convolutional neural networks (CNN). FPGA-based CNN accelerators need to fully consider the optimization of convolution loops to get ideal performance. This work analyzes convolution loop optimization in detail, exploiting loop tiling, loop unrolling, and loop interchange to design the dataflow of accelerator. This work quantitatively evaluates strategies for data reuse and resource utilization, combining fixed and dynamic parallelism to design a high-performance adaptive accelerator. The proposed accelerator is evaluated on ZCU102 FPGA by implementing a five-layer CNN with large differences in convolution layer sizes. It achieves more than 1.14x improvement in throughput efficiency over prior accelerators. And the consumption of logic resources is less than half of prior accelerators while the computing resources are similar.\",\"PeriodicalId\":159508,\"journal\":{\"name\":\"2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCWAMTIP56608.2022.10016562\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWAMTIP56608.2022.10016562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

卷积是卷积神经网络(CNN)中最重要的运算。基于fpga的CNN加速器需要充分考虑卷积回路的优化，以获得理想的性能。本文详细分析了卷积循环优化，利用循环平铺、循环展开和循环交换来设计加速器的数据流。本文定量评估了数据重用和资源利用策略，结合固定并行和动态并行设计了高性能自适应加速器。通过在ZCU102 FPGA上实现卷积层大小差异较大的五层CNN，对所提出的加速器进行了评估。与之前的加速器相比，它的吞吐量效率提高了1.14倍以上。在计算资源相当的情况下，逻辑资源的消耗不到现有加速器的一半。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Adaptive Hardware Accelerator For Convolution Layers With Diverse Sizes

Convolution is the most important operation in convolutional neural networks (CNN). FPGA-based CNN accelerators need to fully consider the optimization of convolution loops to get ideal performance. This work analyzes convolution loop optimization in detail, exploiting loop tiling, loop unrolling, and loop interchange to design the dataflow of accelerator. This work quantitatively evaluates strategies for data reuse and resource utilization, combining fixed and dynamic parallelism to design a high-performance adaptive accelerator. The proposed accelerator is evaluated on ZCU102 FPGA by implementing a five-layer CNN with large differences in convolution layer sizes. It achieves more than 1.14x improvement in throughput efficiency over prior accelerators. And the consumption of logic resources is less than half of prior accelerators while the computing resources are similar.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)

自引率

0.00%

发文量