{"title":"用于深度神经网络加速器的动态精度乘法器","authors":"Chen Ding, Y. Huan, Lirong Zheng, Z. Zou","doi":"10.1109/socc49529.2020.9524752","DOIUrl":null,"url":null,"abstract":"The application of dynamic precision multipliers in the deep neural network accelerators can greatly improve system's data processing capacity under same memory bandwidth limitation. This paper presents a Dynamic Precision Multiplier (DPM) for deep learning accelerators to adapt to light-weight deep learning models with varied precision. The proposed DPM adopts Booth algorithm and Wallace Adder Tree to support parallel computation of signed/unsigned one 16-bit, two 8-bit or four 4-bit at run time. The DPM is further optimized with simplified partial product selection logic and mixed partial product selection structure techniques, reducing power cost for energy-efficient edge computing. The DPM is evaluated in both FPGA and ASIC flow, and the results show that 4-bit mode consumes the least energy among the three modes at 1.34pJ/word. It also saves nearly 22.38% and 232.17% of the power consumption under 16-bit and 8-bit mode respectively when comparing with previous similar designs.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dynamic Precision Multiplier For Deep Neural Network Accelerators\",\"authors\":\"Chen Ding, Y. Huan, Lirong Zheng, Z. Zou\",\"doi\":\"10.1109/socc49529.2020.9524752\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The application of dynamic precision multipliers in the deep neural network accelerators can greatly improve system's data processing capacity under same memory bandwidth limitation. This paper presents a Dynamic Precision Multiplier (DPM) for deep learning accelerators to adapt to light-weight deep learning models with varied precision. The proposed DPM adopts Booth algorithm and Wallace Adder Tree to support parallel computation of signed/unsigned one 16-bit, two 8-bit or four 4-bit at run time. The DPM is further optimized with simplified partial product selection logic and mixed partial product selection structure techniques, reducing power cost for energy-efficient edge computing. The DPM is evaluated in both FPGA and ASIC flow, and the results show that 4-bit mode consumes the least energy among the three modes at 1.34pJ/word. It also saves nearly 22.38% and 232.17% of the power consumption under 16-bit and 8-bit mode respectively when comparing with previous similar designs.\",\"PeriodicalId\":114740,\"journal\":{\"name\":\"2020 IEEE 33rd International System-on-Chip Conference (SOCC)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 33rd International System-on-Chip Conference (SOCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/socc49529.2020.9524752\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/socc49529.2020.9524752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dynamic Precision Multiplier For Deep Neural Network Accelerators
The application of dynamic precision multipliers in the deep neural network accelerators can greatly improve system's data processing capacity under same memory bandwidth limitation. This paper presents a Dynamic Precision Multiplier (DPM) for deep learning accelerators to adapt to light-weight deep learning models with varied precision. The proposed DPM adopts Booth algorithm and Wallace Adder Tree to support parallel computation of signed/unsigned one 16-bit, two 8-bit or four 4-bit at run time. The DPM is further optimized with simplified partial product selection logic and mixed partial product selection structure techniques, reducing power cost for energy-efficient edge computing. The DPM is evaluated in both FPGA and ASIC flow, and the results show that 4-bit mode consumes the least energy among the three modes at 1.34pJ/word. It also saves nearly 22.38% and 232.17% of the power consumption under 16-bit and 8-bit mode respectively when comparing with previous similar designs.