基于Hadamard产品的浮点神经网络训练内存计算设计

Neuromorphic Computing and Engineering Pub Date : 2023-02-09 DOI:10.1088/2634-4386/acbab9

Anjunyi Fan, Yihan Fu, Yaoyu Tao, Zhonghua Jin, Haiyue Han, Huiyu Liu, Yaojun Zhang, Bonan Yan, Yuch-Chi Yang, Ru Huang

{"title":"基于Hadamard产品的浮点神经网络训练内存计算设计","authors":"Anjunyi Fan, Yihan Fu, Yaoyu Tao, Zhonghua Jin, Haiyue Han, Huiyu Liu, Yaojun Zhang, Bonan Yan, Yuch-Chi Yang, Ru Huang","doi":"10.1088/2634-4386/acbab9","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) are one of the key fields of machine learning. It requires considerable computational resources for cognitive tasks. As a novel technology to perform computing inside/near memory units, in-memory computing (IMC) significantly improves computing efficiency by reducing the need for repetitive data transfer between the processing and memory units. However, prior IMC designs mainly focus on the acceleration for DNN inference. DNN training with the IMC hardware has rarely been proposed. The challenges lie in the requirement of DNN training for high precision (e.g. floating point (FP)) and various operations of tensors (e.g. inner and outer products). These challenges call for the IMC design with new features. This paper proposes a novel Hadamard product-based IMC design for FP DNN training. Our design consists of multiple compartments, which are the basic units for the matrix element-wise processing. We also develop BFloat16 post-processing circuits and fused adder trees, laying the foundation for IMC FP processing. Based on the proposed circuit scheme, we reformulate the back-propagation training algorithm for the convenience and efficiency of the IMC execution. The proposed design is implemented with commercial 28 nm technology process design kits and benchmarked with widely used neural networks. We model the influence of the circuit structural design parameters and provide an analysis framework for design space exploration. Our simulation validates that MobileNet training with the proposed IMC scheme saves 91.2% in energy and 13.9% in time versus the same task with NVIDIA GTX 3060 GPU. The proposed IMC design has a data density of 769.2 Kb mm−2 with the FP processing circuits included, showing a 3.5 × improvement than the prior FP IMC designs.","PeriodicalId":198030,"journal":{"name":"Neuromorphic Computing and Engineering","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hadamard product-based in-memory computing design for floating point neural network training\",\"authors\":\"Anjunyi Fan, Yihan Fu, Yaoyu Tao, Zhonghua Jin, Haiyue Han, Huiyu Liu, Yaojun Zhang, Bonan Yan, Yuch-Chi Yang, Ru Huang\",\"doi\":\"10.1088/2634-4386/acbab9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) are one of the key fields of machine learning. It requires considerable computational resources for cognitive tasks. As a novel technology to perform computing inside/near memory units, in-memory computing (IMC) significantly improves computing efficiency by reducing the need for repetitive data transfer between the processing and memory units. However, prior IMC designs mainly focus on the acceleration for DNN inference. DNN training with the IMC hardware has rarely been proposed. The challenges lie in the requirement of DNN training for high precision (e.g. floating point (FP)) and various operations of tensors (e.g. inner and outer products). These challenges call for the IMC design with new features. This paper proposes a novel Hadamard product-based IMC design for FP DNN training. Our design consists of multiple compartments, which are the basic units for the matrix element-wise processing. We also develop BFloat16 post-processing circuits and fused adder trees, laying the foundation for IMC FP processing. Based on the proposed circuit scheme, we reformulate the back-propagation training algorithm for the convenience and efficiency of the IMC execution. The proposed design is implemented with commercial 28 nm technology process design kits and benchmarked with widely used neural networks. We model the influence of the circuit structural design parameters and provide an analysis framework for design space exploration. Our simulation validates that MobileNet training with the proposed IMC scheme saves 91.2% in energy and 13.9% in time versus the same task with NVIDIA GTX 3060 GPU. The proposed IMC design has a data density of 769.2 Kb mm−2 with the FP processing circuits included, showing a 3.5 × improvement than the prior FP IMC designs.\",\"PeriodicalId\":198030,\"journal\":{\"name\":\"Neuromorphic Computing and Engineering\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neuromorphic Computing and Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1088/2634-4386/acbab9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuromorphic Computing and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2634-4386/acbab9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

深度神经网络(dnn)是机器学习的关键领域之一。它需要大量的计算资源来完成认知任务。内存计算(IMC)作为一种在内存内/近内存单元执行计算的新技术，通过减少处理单元和内存单元之间重复数据传输的需要，显著提高了计算效率。然而，先前的IMC设计主要集中在DNN推理的加速上。使用IMC硬件进行DNN训练的方法很少被提出。挑战在于DNN训练要求高精度(例如浮点数(FP))和各种张量操作(例如内积和外积)。这些挑战需要具有新功能的IMC设计。本文提出了一种新的基于Hadamard产品的FP深度神经网络训练IMC设计。我们的设计由多个隔间组成，这些隔间是矩阵元素处理的基本单元。我们还开发了BFloat16后处理电路和融合加法树，为IMC FP处理奠定了基础。基于所提出的电路方案，我们重新制定了反向传播训练算法，以方便和高效地执行IMC。该设计采用商用28纳米工艺设计套件实现，并采用广泛使用的神经网络进行基准测试。我们对电路结构设计参数的影响进行了建模，并为设计空间探索提供了一个分析框架。仿真结果表明，与使用NVIDIA GTX 3060 GPU相比，使用IMC方案进行MobileNet训练可节省91.2%的能量和13.9%的时间。该IMC设计的数据密度为769.2 Kb mm−2，其中包括FP处理电路，比先前的FP IMC设计提高了3.5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hadamard product-based in-memory computing design for floating point neural network training

Deep neural networks (DNNs) are one of the key fields of machine learning. It requires considerable computational resources for cognitive tasks. As a novel technology to perform computing inside/near memory units, in-memory computing (IMC) significantly improves computing efficiency by reducing the need for repetitive data transfer between the processing and memory units. However, prior IMC designs mainly focus on the acceleration for DNN inference. DNN training with the IMC hardware has rarely been proposed. The challenges lie in the requirement of DNN training for high precision (e.g. floating point (FP)) and various operations of tensors (e.g. inner and outer products). These challenges call for the IMC design with new features. This paper proposes a novel Hadamard product-based IMC design for FP DNN training. Our design consists of multiple compartments, which are the basic units for the matrix element-wise processing. We also develop BFloat16 post-processing circuits and fused adder trees, laying the foundation for IMC FP processing. Based on the proposed circuit scheme, we reformulate the back-propagation training algorithm for the convenience and efficiency of the IMC execution. The proposed design is implemented with commercial 28 nm technology process design kits and benchmarked with widely used neural networks. We model the influence of the circuit structural design parameters and provide an analysis framework for design space exploration. Our simulation validates that MobileNet training with the proposed IMC scheme saves 91.2% in energy and 13.9% in time versus the same task with NVIDIA GTX 3060 GPU. The proposed IMC design has a data density of 769.2 Kb mm−2 with the FP processing circuits included, showing a 3.5 × improvement than the prior FP IMC designs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neuromorphic Computing and Engineering

CiteScore

5.90

自引率

0.00%

发文量