A Run-time Tapered Floating-Point Adder/Subtractor Supporting Vectorization

Ashish Reddy Bommana, Srinivas Boppu
{"title":"A Run-time Tapered Floating-Point Adder/Subtractor Supporting Vectorization","authors":"Ashish Reddy Bommana, Srinivas Boppu","doi":"10.1109/MCSoC57363.2022.00056","DOIUrl":null,"url":null,"abstract":"In this era of widespread embedded computing, energy efficiency has become the new performance criterion; as a result, accelerator-rich multi-processor system-on-chips are widely utilized in embedded computing hardware. Due to abun-dant and inexpensive computational capacity, computationally intensive machine learning applications have gained a lot of traction and are currently being used in a wide range of application domains. Furthermore, there is an increasing trend toward developing hardware accelerators for machine learning applications for embedded edge devices where performance and energy efficiency are critical. Although floating-point operations are frequently used for accuracy in these hardware accelerators, reduced width floating point formats are also used to reduce hardware complexity and thus power consumption while pre-serving accuracy. Mixed-precision DNN, vectorization techniques, and any-precision DNN concepts have also proven to boost performance, energy efficiency, and memory bandwidth. In this paper, we propose the design of a vectorized floating-point adder/subtractor that can handle arbitrary length floating-point formats with varying exponent and mantissa widths. The whole idea of this paper is to bring flexibility to each layer in a DNN model for arithmetic operations; depending on the requirement of computation of each layer, exponent width and the floating-point format are chosen dynamically. In comparison to existing designs in the literature, the proposed design is $1.69\\times$ area and $1.61\\times$ power-efficient, and it supports true vectorization with no restrictions on exponent and mantissa widths.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC57363.2022.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this era of widespread embedded computing, energy efficiency has become the new performance criterion; as a result, accelerator-rich multi-processor system-on-chips are widely utilized in embedded computing hardware. Due to abun-dant and inexpensive computational capacity, computationally intensive machine learning applications have gained a lot of traction and are currently being used in a wide range of application domains. Furthermore, there is an increasing trend toward developing hardware accelerators for machine learning applications for embedded edge devices where performance and energy efficiency are critical. Although floating-point operations are frequently used for accuracy in these hardware accelerators, reduced width floating point formats are also used to reduce hardware complexity and thus power consumption while pre-serving accuracy. Mixed-precision DNN, vectorization techniques, and any-precision DNN concepts have also proven to boost performance, energy efficiency, and memory bandwidth. In this paper, we propose the design of a vectorized floating-point adder/subtractor that can handle arbitrary length floating-point formats with varying exponent and mantissa widths. The whole idea of this paper is to bring flexibility to each layer in a DNN model for arithmetic operations; depending on the requirement of computation of each layer, exponent width and the floating-point format are chosen dynamically. In comparison to existing designs in the literature, the proposed design is $1.69\times$ area and $1.61\times$ power-efficient, and it supports true vectorization with no restrictions on exponent and mantissa widths.
支持向量化的运行时锥形浮点加/减法器
在这个嵌入式计算普及的时代,能效已成为新的性能标准;因此,富含加速器的多处理器片上系统在嵌入式计算硬件中得到了广泛应用。由于大量且廉价的计算能力,计算密集型机器学习应用已经获得了很大的牵引力,目前正在广泛的应用领域中使用。此外,为嵌入式边缘设备的机器学习应用开发硬件加速器的趋势越来越大,其中性能和能源效率至关重要。尽管在这些硬件加速器中经常使用浮点运算来提高精度,但减小宽度的浮点格式也用于降低硬件复杂性,从而在保持精度的同时降低功耗。混合精度深度神经网络、向量化技术和任意精度深度神经网络概念也被证明可以提高性能、能源效率和内存带宽。在本文中,我们提出了一个矢量化浮点加减法器的设计,它可以处理具有不同指数和尾数宽度的任意长度浮点格式。本文的整体思想是为深度神经网络模型的每一层的算术运算带来灵活性;根据每一层的计算需求,动态选择指数宽度和浮点数格式。与文献中的现有设计相比,所提出的设计面积为1.69倍,功耗为1.61倍,并且支持真正的矢量化,不受指数和尾数宽度的限制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信