稀疏卷积神经网络的无搜索加速器

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI:10.1109/ASP-DAC47756.2020.9045580

Bosheng Liu, Xiaoming Chen, Yinhe Han, Ying Wang, Jiajun Li, Haobo Xu, Xiaowei Li

{"title":"稀疏卷积神经网络的无搜索加速器","authors":"Bosheng Liu, Xiaoming Chen, Yinhe Han, Ying Wang, Jiajun Li, Haobo Xu, Xiaowei Li","doi":"10.1109/ASP-DAC47756.2020.9045580","DOIUrl":null,"url":null,"abstract":"Sparsification is an efficient solution to reduce the demand of on-chip memory space for deep convolutional neural networks (CNNs). Most of state-of-the-art CNN accelerators can deliver high throughput for sparse CNNs by searching pairs of nonzero weights and activations, and then sending them to processing elements (PEs) for multiplication-accumulation (MAC) operations. However, their PE scales are difficult to be increased for superior and efficient computing because of the significant internal interconnect and memory bandwidth consumption. To deal with this dilemma, we propose a sparsity-aware architecture, called Swan, which frees the search process for sparse CNNs under limited interconnect and bandwidth resources. The architecture comprises two parts: a MAC unit that can free the search operation for the sparsity-aware MAC calculation, and a systolic compressive dataflow that well suits the MAC architecture and greatly reuses inputs for interconnect and bandwidth saving. With the proposed architecture, only one column of the PEs needs to load/store data while all PEs can operate in full scale. Evaluation results based on a place-and-route process show that the proposed design, in a compact factor of 4096 PEs, 4.9TOP/s peak performance, and 2.97W power running at 600MHz, achieves 1.5-2.1× speedup and 6.0-9.1× higher energy efficiency than state-of-the-art CNN accelerators with the same PE scale.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Search-free Accelerator for Sparse Convolutional Neural Networks\",\"authors\":\"Bosheng Liu, Xiaoming Chen, Yinhe Han, Ying Wang, Jiajun Li, Haobo Xu, Xiaowei Li\",\"doi\":\"10.1109/ASP-DAC47756.2020.9045580\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sparsification is an efficient solution to reduce the demand of on-chip memory space for deep convolutional neural networks (CNNs). Most of state-of-the-art CNN accelerators can deliver high throughput for sparse CNNs by searching pairs of nonzero weights and activations, and then sending them to processing elements (PEs) for multiplication-accumulation (MAC) operations. However, their PE scales are difficult to be increased for superior and efficient computing because of the significant internal interconnect and memory bandwidth consumption. To deal with this dilemma, we propose a sparsity-aware architecture, called Swan, which frees the search process for sparse CNNs under limited interconnect and bandwidth resources. The architecture comprises two parts: a MAC unit that can free the search operation for the sparsity-aware MAC calculation, and a systolic compressive dataflow that well suits the MAC architecture and greatly reuses inputs for interconnect and bandwidth saving. With the proposed architecture, only one column of the PEs needs to load/store data while all PEs can operate in full scale. Evaluation results based on a place-and-route process show that the proposed design, in a compact factor of 4096 PEs, 4.9TOP/s peak performance, and 2.97W power running at 600MHz, achieves 1.5-2.1× speedup and 6.0-9.1× higher energy efficiency than state-of-the-art CNN accelerators with the same PE scale.\",\"PeriodicalId\":125112,\"journal\":{\"name\":\"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASP-DAC47756.2020.9045580\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC47756.2020.9045580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

稀疏化是减少深度卷积神经网络对片上存储空间需求的有效解决方案。大多数最先进的CNN加速器通过搜索非零权值和激活对，然后将它们发送到处理单元(pe)进行乘法累积(MAC)操作，为稀疏CNN提供高吞吐量。然而，由于大量的内部互连和内存带宽消耗，它们的PE规模很难增加以实现卓越和高效的计算。为了解决这一难题，我们提出了一种稀疏感知架构，称为Swan，它在有限的互连和带宽资源下解放了稀疏cnn的搜索过程。该体系结构包括两个部分:一个是MAC单元，它可以为具有稀疏性的MAC计算释放搜索操作;另一个是压缩数据流，它非常适合MAC体系结构，可以大量重用输入，从而实现互连和节省带宽。使用建议的体系结构，pe中只有一列需要加载/存储数据，而所有pe都可以全规模运行。基于放置路径过程的评估结果表明，该设计在4096个PE的紧凑系数下，峰值性能为4.9TOP/s, 600MHz工作功率为2.97W，与同等PE规模的最先进的CNN加速器相比，加速提高1.5-2.1倍，能效提高6.0-9.1倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Search-free Accelerator for Sparse Convolutional Neural Networks

Sparsification is an efficient solution to reduce the demand of on-chip memory space for deep convolutional neural networks (CNNs). Most of state-of-the-art CNN accelerators can deliver high throughput for sparse CNNs by searching pairs of nonzero weights and activations, and then sending them to processing elements (PEs) for multiplication-accumulation (MAC) operations. However, their PE scales are difficult to be increased for superior and efficient computing because of the significant internal interconnect and memory bandwidth consumption. To deal with this dilemma, we propose a sparsity-aware architecture, called Swan, which frees the search process for sparse CNNs under limited interconnect and bandwidth resources. The architecture comprises two parts: a MAC unit that can free the search operation for the sparsity-aware MAC calculation, and a systolic compressive dataflow that well suits the MAC architecture and greatly reuses inputs for interconnect and bandwidth saving. With the proposed architecture, only one column of the PEs needs to load/store data while all PEs can operate in full scale. Evaluation results based on a place-and-route process show that the proposed design, in a compact factor of 4096 PEs, 4.9TOP/s peak performance, and 2.97W power running at 600MHz, achieves 1.5-2.1× speedup and 6.0-9.1× higher energy efficiency than state-of-the-art CNN accelerators with the same PE scale.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)

自引率

0.00%

发文量