A 13.7 TFLOPS/W Floating-point DNN Processor using Heterogeneous Computing Architecture with Exponent-Computing-in-Memory

Juhyoung Lee, Jihoon Kim, Wooyoung Jo, Sangyeob Kim, Sangjin Kim, Jinsu Lee, H. Yoo
{"title":"A 13.7 TFLOPS/W Floating-point DNN Processor using Heterogeneous Computing Architecture with Exponent-Computing-in-Memory","authors":"Juhyoung Lee, Jihoon Kim, Wooyoung Jo, Sangyeob Kim, Sangjin Kim, Jinsu Lee, H. Yoo","doi":"10.23919/VLSICircuits52068.2021.9492476","DOIUrl":null,"url":null,"abstract":"An energy-efficient floating-point DNN training processor is proposed with heterogenous bfloat16 computing architecture using exponent computing-in-memory (CIM) and mantissa processing engine. Mantissa free exponent calculation enables pipelining of exponent and mantissa operation for heterogenous bfloat16 computing while reducing MAC power by 14.4 %. 6T SRAM exponent computing-in-memory with bitline charge reusing reduces memory access power by 46.4 %. The processor fabricated in 28 nm CMOS technology and occupies 1.62×3.6 mm2 die area. It achieves 13.7 TFLOPS/W energy efficiency which is 274× higher than the previous floating-point CIM processor.","PeriodicalId":106356,"journal":{"name":"2021 Symposium on VLSI Circuits","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Symposium on VLSI Circuits","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/VLSICircuits52068.2021.9492476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

An energy-efficient floating-point DNN training processor is proposed with heterogenous bfloat16 computing architecture using exponent computing-in-memory (CIM) and mantissa processing engine. Mantissa free exponent calculation enables pipelining of exponent and mantissa operation for heterogenous bfloat16 computing while reducing MAC power by 14.4 %. 6T SRAM exponent computing-in-memory with bitline charge reusing reduces memory access power by 46.4 %. The processor fabricated in 28 nm CMOS technology and occupies 1.62×3.6 mm2 die area. It achieves 13.7 TFLOPS/W energy efficiency which is 274× higher than the previous floating-point CIM processor.
基于内存指数计算的异构计算架构的13.7 TFLOPS/W浮点DNN处理器
采用指数内存计算(CIM)和尾数处理引擎,提出了一种具有异构bfloat16计算架构的节能浮点深度神经网络训练处理器。无尾数的指数计算使指数和尾数运算在异构bfloat16计算中实现流水线化,同时将MAC功耗降低14.4%。采用位线电荷重用的6T SRAM指数内存计算可使内存访问功率降低46.4%。该处理器采用28纳米CMOS技术制造,芯片面积为1.62×3.6 mm2。它实现了13.7 TFLOPS/W的能效,比以前的浮点型CIM处理器提高了274倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信