{"title":"Digital Computation-in-Memory Design with Adaptive Floating Point for Deep Neural Networks","authors":"Yunhan Yang, Wei Lu, Po-Tsang Huang, Hung-Ming Chen","doi":"10.1109/MCSoC57363.2022.00042","DOIUrl":null,"url":null,"abstract":"All-digital deep neural network (DNN) accelerators or processors suffer from the Von-Neumann bottleneck, because of the massive data movement required in DNNs. Computation-in-memory (CIM) can reduce the data movement by performing the computations in the memory to save the above problem. However, the analog CIM is susceptible to PVT variations and limited by the analog-digital/digital-analog conversions (ADC/DAC). Most of the current digital CIM techniques adopt integer operation and the bit-serial method, which limits the throughput to the total number of bits. Moreover, they use the adder tree for accumulation, which causes severe area overhead. In this paper, a folded architecture based on time-division multiplexing is proposed to reduce the area and improve the energy efficiency without reducing the throughput. We quantize and ternarize the adaptive floating point (ADP) format with low bits, which can achieve the same or better accuracy than integer quantization, to improve the energy cost of calculation and data movement. This proposed technique can improve the overall throughput and energy efficiency up to 3.83x and 2.19x, respectively, compared to other state-of-the-art digital CIMs with integer.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC57363.2022.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
All-digital deep neural network (DNN) accelerators or processors suffer from the Von-Neumann bottleneck, because of the massive data movement required in DNNs. Computation-in-memory (CIM) can reduce the data movement by performing the computations in the memory to save the above problem. However, the analog CIM is susceptible to PVT variations and limited by the analog-digital/digital-analog conversions (ADC/DAC). Most of the current digital CIM techniques adopt integer operation and the bit-serial method, which limits the throughput to the total number of bits. Moreover, they use the adder tree for accumulation, which causes severe area overhead. In this paper, a folded architecture based on time-division multiplexing is proposed to reduce the area and improve the energy efficiency without reducing the throughput. We quantize and ternarize the adaptive floating point (ADP) format with low bits, which can achieve the same or better accuracy than integer quantization, to improve the energy cost of calculation and data movement. This proposed technique can improve the overall throughput and energy efficiency up to 3.83x and 2.19x, respectively, compared to other state-of-the-art digital CIMs with integer.