基于4位笛卡尔积MAC阵列和流水线激活对齐器的深度随机稀疏神经网络边缘推理引擎

2021 IEEE Hot Chips 33 Symposium (HCS) Pub Date : 2021-08-22 DOI:10.1109/HCS52781.2021.9567328

Kota Ando, Jaehoon Yu, Kazutoshi Hirose, Hiroki Nakahara, Kazushi Kawamura, Thiem Van Chu, M. Motomura

{"title":"基于4位笛卡尔积MAC阵列和流水线激活对齐器的深度随机稀疏神经网络边缘推理引擎","authors":"Kota Ando, Jaehoon Yu, Kazutoshi Hirose, Hiroki Nakahara, Kazushi Kawamura, Thiem Van Chu, M. Motomura","doi":"10.1109/HCS52781.2021.9567328","DOIUrl":null,"url":null,"abstract":"A 4b-quantized convolutional neural network (CNN) inference engine for edge-AI is presented featuring a Cartesian-product MAC array and pipelined activation aligners targeting deep-/random-pruned models. A 40nm prototype with 32x32 MACs and 5Mb SRAM runs at 534 MHz, 1.07 TOPS, 352 mW at 1.1V, and attains 5.30 dense TOPS/W, 234 MHz at 0.8V. Sparse TOPS/W reaches 26.5 when running a randomly pruned model (after 88% pruning). Training algorithms for obtaining highly efficient sparse/quantized models are also proposed.","PeriodicalId":246531,"journal":{"name":"2021 IEEE Hot Chips 33 Symposium (HCS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Edge Inference Engine for Deep & Random Sparse Neural Networks with 4-bit Cartesian-Product MAC Array and Pipelined Activation Aligner\",\"authors\":\"Kota Ando, Jaehoon Yu, Kazutoshi Hirose, Hiroki Nakahara, Kazushi Kawamura, Thiem Van Chu, M. Motomura\",\"doi\":\"10.1109/HCS52781.2021.9567328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A 4b-quantized convolutional neural network (CNN) inference engine for edge-AI is presented featuring a Cartesian-product MAC array and pipelined activation aligners targeting deep-/random-pruned models. A 40nm prototype with 32x32 MACs and 5Mb SRAM runs at 534 MHz, 1.07 TOPS, 352 mW at 1.1V, and attains 5.30 dense TOPS/W, 234 MHz at 0.8V. Sparse TOPS/W reaches 26.5 when running a randomly pruned model (after 88% pruning). Training algorithms for obtaining highly efficient sparse/quantized models are also proposed.\",\"PeriodicalId\":246531,\"journal\":{\"name\":\"2021 IEEE Hot Chips 33 Symposium (HCS)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Hot Chips 33 Symposium (HCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HCS52781.2021.9567328\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Hot Chips 33 Symposium (HCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HCS52781.2021.9567328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

提出了一种用于边缘人工智能的4b量化卷积神经网络(CNN)推理引擎，该引擎采用笛卡尔积MAC阵列和针对深度/随机修剪模型的流水线激活对齐器。具有32x32 mac和5Mb SRAM的40nm原型机在1.1V下运行534 MHz, 1.07 TOPS, 352 mW，并在0.8V下达到5.30 dense TOPS/W, 234 MHz。运行随机剪枝模型时(剪枝88%后)，Sparse TOPS/W达到26.5。本文还提出了获得高效稀疏/量化模型的训练算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Edge Inference Engine for Deep & Random Sparse Neural Networks with 4-bit Cartesian-Product MAC Array and Pipelined Activation Aligner

A 4b-quantized convolutional neural network (CNN) inference engine for edge-AI is presented featuring a Cartesian-product MAC array and pipelined activation aligners targeting deep-/random-pruned models. A 40nm prototype with 32x32 MACs and 5Mb SRAM runs at 534 MHz, 1.07 TOPS, 352 mW at 1.1V, and attains 5.30 dense TOPS/W, 234 MHz at 0.8V. Sparse TOPS/W reaches 26.5 when running a randomly pruned model (after 88% pruning). Training algorithms for obtaining highly efficient sparse/quantized models are also proposed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE Hot Chips 33 Symposium (HCS)

自引率

0.00%

发文量