采用压缩LUT乘法器和低功耗WL电压微调方案的55nm 32Mb数字Flash CIM用于AI边缘推断

2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) Pub Date : 2022-11-11 DOI:10.1109/APCCAS55924.2022.10090358

Hongyang Hu, Zi Wang, Xiaoxin Xu, K. Xi, Kun Zhang, Junyu Zhang, C. Dou

{"title":"采用压缩LUT乘法器和低功耗WL电压微调方案的55nm 32Mb数字Flash CIM用于AI边缘推断","authors":"Hongyang Hu, Zi Wang, Xiaoxin Xu, K. Xi, Kun Zhang, Junyu Zhang, C. Dou","doi":"10.1109/APCCAS55924.2022.10090358","DOIUrl":null,"url":null,"abstract":"In this work, we proposed a digital flash computing-in-memory (CIM) architecture using compressed lookup-table multiplier (CLUTM) and low power word-line voltage trimming (LP-WLVT) schemes. The proposed concept is highly compatible to the standard commodity NOR flash memory. Compared to the conventional lookup-table (LUT) multipliers, CLUTM results in 32 times reduction on the area cost in the case of 8-bit multiplication. The LP-WLVT scheme can further reduce the inference power by 14%. The concept is silicon demonstrated in a 55nm 32Mb commercial flash memory, which can perform 8-bit multiply-and-accumulate (MAC) with a throughput of 51.2 GOPs. It provides 1.778ms frame shift when running TC-resnet8 network, which is $5 \\times$ more efficient than the previous works. The CLUTM-based digital CIM architecture can play an important role to enable commercial flash for highly-efficient AI edge inference.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A 55nm 32Mb Digital Flash CIM Using Compressed LUT Multiplier and Low Power WL Voltage Trimming Scheme for AI Edge Inference\",\"authors\":\"Hongyang Hu, Zi Wang, Xiaoxin Xu, K. Xi, Kun Zhang, Junyu Zhang, C. Dou\",\"doi\":\"10.1109/APCCAS55924.2022.10090358\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we proposed a digital flash computing-in-memory (CIM) architecture using compressed lookup-table multiplier (CLUTM) and low power word-line voltage trimming (LP-WLVT) schemes. The proposed concept is highly compatible to the standard commodity NOR flash memory. Compared to the conventional lookup-table (LUT) multipliers, CLUTM results in 32 times reduction on the area cost in the case of 8-bit multiplication. The LP-WLVT scheme can further reduce the inference power by 14%. The concept is silicon demonstrated in a 55nm 32Mb commercial flash memory, which can perform 8-bit multiply-and-accumulate (MAC) with a throughput of 51.2 GOPs. It provides 1.778ms frame shift when running TC-resnet8 network, which is $5 \\\\times$ more efficient than the previous works. The CLUTM-based digital CIM architecture can play an important role to enable commercial flash for highly-efficient AI edge inference.\",\"PeriodicalId\":243739,\"journal\":{\"name\":\"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APCCAS55924.2022.10090358\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APCCAS55924.2022.10090358","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在这项工作中，我们提出了一个使用压缩查找表乘法器(CLUTM)和低功耗字线电压微调(LP-WLVT)方案的数字闪存内存计算(CIM)架构。所提出的概念与标准的商用NOR快闪记忆体高度相容。与传统的查找表(LUT)乘法器相比，在8位乘法的情况下，CLUTM的面积成本降低了32倍。LP-WLVT方案可以进一步降低14%的推理能力。该概念在55nm 32Mb商用闪存中得到了验证，该闪存可以执行8位乘法和累积(MAC)，吞吐量为51.2 GOPs。它在运行TC-resnet8网络时提供1.778ms的帧移位，比以前的工作效率提高了5倍。基于clutm的数字CIM架构可以为实现高效AI边缘推理的商用闪存发挥重要作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A 55nm 32Mb Digital Flash CIM Using Compressed LUT Multiplier and Low Power WL Voltage Trimming Scheme for AI Edge Inference

In this work, we proposed a digital flash computing-in-memory (CIM) architecture using compressed lookup-table multiplier (CLUTM) and low power word-line voltage trimming (LP-WLVT) schemes. The proposed concept is highly compatible to the standard commodity NOR flash memory. Compared to the conventional lookup-table (LUT) multipliers, CLUTM results in 32 times reduction on the area cost in the case of 8-bit multiplication. The LP-WLVT scheme can further reduce the inference power by 14%. The concept is silicon demonstrated in a 55nm 32Mb commercial flash memory, which can perform 8-bit multiply-and-accumulate (MAC) with a throughput of 51.2 GOPs. It provides 1.778ms frame shift when running TC-resnet8 network, which is $5 \times$ more efficient than the previous works. The CLUTM-based digital CIM architecture can play an important role to enable commercial flash for highly-efficient AI edge inference.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)

自引率

0.00%

发文量