基于自适应浮点数的深度神经网络数字内存计算设计

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI:10.1109/MCSoC57363.2022.00042

Yunhan Yang, Wei Lu, Po-Tsang Huang, Hung-Ming Chen

{"title":"基于自适应浮点数的深度神经网络数字内存计算设计","authors":"Yunhan Yang, Wei Lu, Po-Tsang Huang, Hung-Ming Chen","doi":"10.1109/MCSoC57363.2022.00042","DOIUrl":null,"url":null,"abstract":"All-digital deep neural network (DNN) accelerators or processors suffer from the Von-Neumann bottleneck, because of the massive data movement required in DNNs. Computation-in-memory (CIM) can reduce the data movement by performing the computations in the memory to save the above problem. However, the analog CIM is susceptible to PVT variations and limited by the analog-digital/digital-analog conversions (ADC/DAC). Most of the current digital CIM techniques adopt integer operation and the bit-serial method, which limits the throughput to the total number of bits. Moreover, they use the adder tree for accumulation, which causes severe area overhead. In this paper, a folded architecture based on time-division multiplexing is proposed to reduce the area and improve the energy efficiency without reducing the throughput. We quantize and ternarize the adaptive floating point (ADP) format with low bits, which can achieve the same or better accuracy than integer quantization, to improve the energy cost of calculation and data movement. This proposed technique can improve the overall throughput and energy efficiency up to 3.83x and 2.19x, respectively, compared to other state-of-the-art digital CIMs with integer.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Digital Computation-in-Memory Design with Adaptive Floating Point for Deep Neural Networks\",\"authors\":\"Yunhan Yang, Wei Lu, Po-Tsang Huang, Hung-Ming Chen\",\"doi\":\"10.1109/MCSoC57363.2022.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"All-digital deep neural network (DNN) accelerators or processors suffer from the Von-Neumann bottleneck, because of the massive data movement required in DNNs. Computation-in-memory (CIM) can reduce the data movement by performing the computations in the memory to save the above problem. However, the analog CIM is susceptible to PVT variations and limited by the analog-digital/digital-analog conversions (ADC/DAC). Most of the current digital CIM techniques adopt integer operation and the bit-serial method, which limits the throughput to the total number of bits. Moreover, they use the adder tree for accumulation, which causes severe area overhead. In this paper, a folded architecture based on time-division multiplexing is proposed to reduce the area and improve the energy efficiency without reducing the throughput. We quantize and ternarize the adaptive floating point (ADP) format with low bits, which can achieve the same or better accuracy than integer quantization, to improve the energy cost of calculation and data movement. This proposed technique can improve the overall throughput and energy efficiency up to 3.83x and 2.19x, respectively, compared to other state-of-the-art digital CIMs with integer.\",\"PeriodicalId\":150801,\"journal\":{\"name\":\"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MCSoC57363.2022.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC57363.2022.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

全数字深度神经网络(DNN)加速器或处理器遭受冯-诺伊曼瓶颈，因为DNN需要大量数据移动。内存计算(CIM)可以通过在内存中执行计算来减少数据移动，从而避免上述问题。然而，模拟CIM易受PVT变化的影响，并且受模数/数模转换(ADC/DAC)的限制。目前的数字CIM技术大多采用整数运算和位串行方法，这将吞吐量限制在位的总数上。此外，它们使用加法器树进行累积，这会导致严重的面积开销。本文提出了一种基于时分复用的折叠架构，在不降低吞吐量的前提下减小了面积，提高了能效。我们对低比特自适应浮点(ADP)格式进行量化和三化处理，可以达到与整数量化相同或更好的精度，从而降低了计算和数据移动的能量消耗。与其他最先进的整数型数字cim相比，该技术可将总体吞吐量和能源效率分别提高3.83倍和2.19倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Digital Computation-in-Memory Design with Adaptive Floating Point for Deep Neural Networks

All-digital deep neural network (DNN) accelerators or processors suffer from the Von-Neumann bottleneck, because of the massive data movement required in DNNs. Computation-in-memory (CIM) can reduce the data movement by performing the computations in the memory to save the above problem. However, the analog CIM is susceptible to PVT variations and limited by the analog-digital/digital-analog conversions (ADC/DAC). Most of the current digital CIM techniques adopt integer operation and the bit-serial method, which limits the throughput to the total number of bits. Moreover, they use the adder tree for accumulation, which causes severe area overhead. In this paper, a folded architecture based on time-division multiplexing is proposed to reduce the area and improve the energy efficiency without reducing the throughput. We quantize and ternarize the adaptive floating point (ADP) format with low bits, which can achieve the same or better accuracy than integer quantization, to improve the energy cost of calculation and data movement. This proposed technique can improve the overall throughput and energy efficiency up to 3.83x and 2.19x, respectively, compared to other state-of-the-art digital CIMs with integer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

自引率

0.00%

发文量