{"title":"支持向量化的运行时锥形浮点加/减法器","authors":"Ashish Reddy Bommana, Srinivas Boppu","doi":"10.1109/MCSoC57363.2022.00056","DOIUrl":null,"url":null,"abstract":"In this era of widespread embedded computing, energy efficiency has become the new performance criterion; as a result, accelerator-rich multi-processor system-on-chips are widely utilized in embedded computing hardware. Due to abun-dant and inexpensive computational capacity, computationally intensive machine learning applications have gained a lot of traction and are currently being used in a wide range of application domains. Furthermore, there is an increasing trend toward developing hardware accelerators for machine learning applications for embedded edge devices where performance and energy efficiency are critical. Although floating-point operations are frequently used for accuracy in these hardware accelerators, reduced width floating point formats are also used to reduce hardware complexity and thus power consumption while pre-serving accuracy. Mixed-precision DNN, vectorization techniques, and any-precision DNN concepts have also proven to boost performance, energy efficiency, and memory bandwidth. In this paper, we propose the design of a vectorized floating-point adder/subtractor that can handle arbitrary length floating-point formats with varying exponent and mantissa widths. The whole idea of this paper is to bring flexibility to each layer in a DNN model for arithmetic operations; depending on the requirement of computation of each layer, exponent width and the floating-point format are chosen dynamically. In comparison to existing designs in the literature, the proposed design is $1.69\\times$ area and $1.61\\times$ power-efficient, and it supports true vectorization with no restrictions on exponent and mantissa widths.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Run-time Tapered Floating-Point Adder/Subtractor Supporting Vectorization\",\"authors\":\"Ashish Reddy Bommana, Srinivas Boppu\",\"doi\":\"10.1109/MCSoC57363.2022.00056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this era of widespread embedded computing, energy efficiency has become the new performance criterion; as a result, accelerator-rich multi-processor system-on-chips are widely utilized in embedded computing hardware. Due to abun-dant and inexpensive computational capacity, computationally intensive machine learning applications have gained a lot of traction and are currently being used in a wide range of application domains. Furthermore, there is an increasing trend toward developing hardware accelerators for machine learning applications for embedded edge devices where performance and energy efficiency are critical. Although floating-point operations are frequently used for accuracy in these hardware accelerators, reduced width floating point formats are also used to reduce hardware complexity and thus power consumption while pre-serving accuracy. Mixed-precision DNN, vectorization techniques, and any-precision DNN concepts have also proven to boost performance, energy efficiency, and memory bandwidth. In this paper, we propose the design of a vectorized floating-point adder/subtractor that can handle arbitrary length floating-point formats with varying exponent and mantissa widths. The whole idea of this paper is to bring flexibility to each layer in a DNN model for arithmetic operations; depending on the requirement of computation of each layer, exponent width and the floating-point format are chosen dynamically. In comparison to existing designs in the literature, the proposed design is $1.69\\\\times$ area and $1.61\\\\times$ power-efficient, and it supports true vectorization with no restrictions on exponent and mantissa widths.\",\"PeriodicalId\":150801,\"journal\":{\"name\":\"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MCSoC57363.2022.00056\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC57363.2022.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Run-time Tapered Floating-Point Adder/Subtractor Supporting Vectorization
In this era of widespread embedded computing, energy efficiency has become the new performance criterion; as a result, accelerator-rich multi-processor system-on-chips are widely utilized in embedded computing hardware. Due to abun-dant and inexpensive computational capacity, computationally intensive machine learning applications have gained a lot of traction and are currently being used in a wide range of application domains. Furthermore, there is an increasing trend toward developing hardware accelerators for machine learning applications for embedded edge devices where performance and energy efficiency are critical. Although floating-point operations are frequently used for accuracy in these hardware accelerators, reduced width floating point formats are also used to reduce hardware complexity and thus power consumption while pre-serving accuracy. Mixed-precision DNN, vectorization techniques, and any-precision DNN concepts have also proven to boost performance, energy efficiency, and memory bandwidth. In this paper, we propose the design of a vectorized floating-point adder/subtractor that can handle arbitrary length floating-point formats with varying exponent and mantissa widths. The whole idea of this paper is to bring flexibility to each layer in a DNN model for arithmetic operations; depending on the requirement of computation of each layer, exponent width and the floating-point format are chosen dynamically. In comparison to existing designs in the literature, the proposed design is $1.69\times$ area and $1.61\times$ power-efficient, and it supports true vectorization with no restrictions on exponent and mantissa widths.