{"title":"SME: A Systolic Multiply-accumulate Engine for MLP-based Neural Network","authors":"Haochuan Wan, Chaolin Rao, Yueyang Zheng, Pingqiang Zhou, Xin Lou","doi":"10.1109/APCCAS55924.2022.10090307","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an output stationary systolic multiply-accumulate engine (SME) with an optimized dataflow for multilayer perceptron (MLP) computation in the state-of-the-art Neural Radiance Field (NeRF) algorithms. We also analyze activation patterns of the NeRF algorithm which uses ReLU as the activation function, and find that the activation can be sparse, especially in the last several layers. We therefore further take advantage of activation sparsity by gating corresponding multiplications in the SME for power saving. The proposed SME is implemented using SpinalHDL, which is translated to VerilogHDL for VLSI implementation based on 40nm CMOS technology. Evaluation results show that, working at 400MHz, the proposed SME occupies 31.371mm2 circuit area, and consumes 873.7mW power, translating 12,708.10 ksamples/J and 360.06 ksamples/s/mm2.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APCCAS55924.2022.10090307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose an output stationary systolic multiply-accumulate engine (SME) with an optimized dataflow for multilayer perceptron (MLP) computation in the state-of-the-art Neural Radiance Field (NeRF) algorithms. We also analyze activation patterns of the NeRF algorithm which uses ReLU as the activation function, and find that the activation can be sparse, especially in the last several layers. We therefore further take advantage of activation sparsity by gating corresponding multiplications in the SME for power saving. The proposed SME is implemented using SpinalHDL, which is translated to VerilogHDL for VLSI implementation based on 40nm CMOS technology. Evaluation results show that, working at 400MHz, the proposed SME occupies 31.371mm2 circuit area, and consumes 873.7mW power, translating 12,708.10 ksamples/J and 360.06 ksamples/s/mm2.