基于内存指数计算的异构计算架构的13.7 TFLOPS/W浮点DNN处理器

2021 Symposium on VLSI Circuits Pub Date : 2021-06-13 DOI:10.23919/VLSICircuits52068.2021.9492476

Juhyoung Lee, Jihoon Kim, Wooyoung Jo, Sangyeob Kim, Sangjin Kim, Jinsu Lee, H. Yoo

{"title":"基于内存指数计算的异构计算架构的13.7 TFLOPS/W浮点DNN处理器","authors":"Juhyoung Lee, Jihoon Kim, Wooyoung Jo, Sangyeob Kim, Sangjin Kim, Jinsu Lee, H. Yoo","doi":"10.23919/VLSICircuits52068.2021.9492476","DOIUrl":null,"url":null,"abstract":"An energy-efficient floating-point DNN training processor is proposed with heterogenous bfloat16 computing architecture using exponent computing-in-memory (CIM) and mantissa processing engine. Mantissa free exponent calculation enables pipelining of exponent and mantissa operation for heterogenous bfloat16 computing while reducing MAC power by 14.4 %. 6T SRAM exponent computing-in-memory with bitline charge reusing reduces memory access power by 46.4 %. The processor fabricated in 28 nm CMOS technology and occupies 1.62×3.6 mm2 die area. It achieves 13.7 TFLOPS/W energy efficiency which is 274× higher than the previous floating-point CIM processor.","PeriodicalId":106356,"journal":{"name":"2021 Symposium on VLSI Circuits","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"A 13.7 TFLOPS/W Floating-point DNN Processor using Heterogeneous Computing Architecture with Exponent-Computing-in-Memory\",\"authors\":\"Juhyoung Lee, Jihoon Kim, Wooyoung Jo, Sangyeob Kim, Sangjin Kim, Jinsu Lee, H. Yoo\",\"doi\":\"10.23919/VLSICircuits52068.2021.9492476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An energy-efficient floating-point DNN training processor is proposed with heterogenous bfloat16 computing architecture using exponent computing-in-memory (CIM) and mantissa processing engine. Mantissa free exponent calculation enables pipelining of exponent and mantissa operation for heterogenous bfloat16 computing while reducing MAC power by 14.4 %. 6T SRAM exponent computing-in-memory with bitline charge reusing reduces memory access power by 46.4 %. The processor fabricated in 28 nm CMOS technology and occupies 1.62×3.6 mm2 die area. It achieves 13.7 TFLOPS/W energy efficiency which is 274× higher than the previous floating-point CIM processor.\",\"PeriodicalId\":106356,\"journal\":{\"name\":\"2021 Symposium on VLSI Circuits\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Symposium on VLSI Circuits\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/VLSICircuits52068.2021.9492476\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Symposium on VLSI Circuits","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/VLSICircuits52068.2021.9492476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

采用指数内存计算(CIM)和尾数处理引擎，提出了一种具有异构bfloat16计算架构的节能浮点深度神经网络训练处理器。无尾数的指数计算使指数和尾数运算在异构bfloat16计算中实现流水线化，同时将MAC功耗降低14.4%。采用位线电荷重用的6T SRAM指数内存计算可使内存访问功率降低46.4%。该处理器采用28纳米CMOS技术制造，芯片面积为1.62×3.6 mm2。它实现了13.7 TFLOPS/W的能效，比以前的浮点型CIM处理器提高了274倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A 13.7 TFLOPS/W Floating-point DNN Processor using Heterogeneous Computing Architecture with Exponent-Computing-in-Memory

An energy-efficient floating-point DNN training processor is proposed with heterogenous bfloat16 computing architecture using exponent computing-in-memory (CIM) and mantissa processing engine. Mantissa free exponent calculation enables pipelining of exponent and mantissa operation for heterogenous bfloat16 computing while reducing MAC power by 14.4 %. 6T SRAM exponent computing-in-memory with bitline charge reusing reduces memory access power by 46.4 %. The processor fabricated in 28 nm CMOS technology and occupies 1.62×3.6 mm2 die area. It achieves 13.7 TFLOPS/W energy efficiency which is 274× higher than the previous floating-point CIM processor.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 Symposium on VLSI Circuits

自引率

0.00%

发文量