ARCHON: A 332.7TOPS/W 5b Variation-Tolerant Analog CNN Processor Featuring Analog Neuronal Computation Unit and Analog Memory

2022 IEEE International Solid- State Circuits Conference (ISSCC) Pub Date : 2022-02-20 DOI:10.1109/ISSCC42614.2022.9731654

Jin-O Seo, Mingoo Seok, Seonghwan Cho

{"title":"ARCHON: A 332.7TOPS/W 5b Variation-Tolerant Analog CNN Processor Featuring Analog Neuronal Computation Unit and Analog Memory","authors":"Jin-O Seo, Mingoo Seok, Seonghwan Cho","doi":"10.1109/ISSCC42614.2022.9731654","DOIUrl":null,"url":null,"abstract":"One of the notable trends in convolutional neural network (CNN) processor architecture is to embrace analog hardware to improve energy efficiency in performing multiply-and-accumulate (MAC). Prior works investigated charge redistribution in a capacitor array [4], [5], phase accumulation in oscillators [2], [6], and the integrator in a delta-sigma modulator [3]. However, these works suffer from two critical challenges. First, they all need frequent use of ADCs and DACs to store and access the large intermediate computation results, i.e. feature maps, to and from the digital SRAM. The energy consumption of such data conversion severely limits the overall energy efficiency. To mitigate it, [1] uses analog memory but only for temporary data and it still requires a large amount of data conversion for computing multiple layers of a CNN model. Second, analog circuits including analog memory inherently exhibit non-negligible variability. Important parameters such as comparator threshold voltage, oscillator frequency, etc., vary across process, voltage, and temperature (PVT), limiting the computing precision of analog hardware. It is critical to increase the tolerance to these variations. In this work, aiming to address these challenges, we propose ARCHON, an analog CNN processor featuring an analog neuronal computation unit (ANU) and an analog memory (AMEM). Designed to tolerate a large amount of PVT variations, ANU and AMEM can perform computations needed for a CNN model in the analog domain, across layers, without data conversions. Fabricated in 28nm CMOS, the proposed processor achieves a state-of-the-art energy-efficiency of 332.7TOPS/W (analog datapath) and 19.9TOPS/W (processor level), while maintaining the inference accuracy across supply voltage and temperature variations.","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"147 1","pages":"258-260"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC42614.2022.9731654","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

One of the notable trends in convolutional neural network (CNN) processor architecture is to embrace analog hardware to improve energy efficiency in performing multiply-and-accumulate (MAC). Prior works investigated charge redistribution in a capacitor array [4], [5], phase accumulation in oscillators [2], [6], and the integrator in a delta-sigma modulator [3]. However, these works suffer from two critical challenges. First, they all need frequent use of ADCs and DACs to store and access the large intermediate computation results, i.e. feature maps, to and from the digital SRAM. The energy consumption of such data conversion severely limits the overall energy efficiency. To mitigate it, [1] uses analog memory but only for temporary data and it still requires a large amount of data conversion for computing multiple layers of a CNN model. Second, analog circuits including analog memory inherently exhibit non-negligible variability. Important parameters such as comparator threshold voltage, oscillator frequency, etc., vary across process, voltage, and temperature (PVT), limiting the computing precision of analog hardware. It is critical to increase the tolerance to these variations. In this work, aiming to address these challenges, we propose ARCHON, an analog CNN processor featuring an analog neuronal computation unit (ANU) and an analog memory (AMEM). Designed to tolerate a large amount of PVT variations, ANU and AMEM can perform computations needed for a CNN model in the analog domain, across layers, without data conversions. Fabricated in 28nm CMOS, the proposed processor achieves a state-of-the-art energy-efficiency of 332.7TOPS/W (analog datapath) and 19.9TOPS/W (processor level), while maintaining the inference accuracy across supply voltage and temperature variations.

查看原文本刊更多论文

ARCHON:一款332.7TOPS/ w5b容变模拟CNN处理器，具有模拟神经元计算单元和模拟存储器

卷积神经网络(CNN)处理器架构的一个显著趋势是采用模拟硬件来提高执行乘法累加(MAC)的能效。先前的研究研究了电容阵列中的电荷再分配[4]、[5]、振荡器中的相位积累[2]、[6]以及δ - σ调制器中的积分器[3]。然而，这些作品面临着两个关键的挑战。首先，它们都需要频繁使用adc和dac来存储和访问大量的中间计算结果，即特征映射，并从数字SRAM中存取。这种数据转换的能耗严重限制了整体的能源效率。为了缓解这一问题，[1]使用了模拟存储器，但仅用于临时数据，并且计算CNN模型的多层仍然需要大量的数据转换。第二，包括模拟存储器在内的模拟电路固有地表现出不可忽略的可变性。比较器阈值电压、振荡器频率等重要参数在不同的过程、电压和温度(PVT)中变化，限制了模拟硬件的计算精度。增加对这些变化的容忍度是至关重要的。在这项工作中，为了解决这些挑战，我们提出了ARCHON，一种具有模拟神经元计算单元(ANU)和模拟存储器(AMEM)的模拟CNN处理器。ANU和AMEM设计用于容忍大量的PVT变化，可以在模拟域中跨层执行CNN模型所需的计算，而无需进行数据转换。该处理器采用28nm CMOS制造，实现了332.7TOPS/W(模拟数据路径)和19.9TOPS/W(处理器级)的最先进能效，同时保持了电源电压和温度变化的推断精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Solid- State Circuits Conference (ISSCC)

自引率

0.00%

发文量