{"title":"Design of a Quadruple Precision Floating-Point Fused Multiply-Add Unit Based on 4-Way SIMD Device","authors":"Jun He, Biao Wang, Ying Zhu","doi":"10.1109/NAS.2013.42","DOIUrl":null,"url":null,"abstract":"Aiming to decrease the hardware cost of floating-point quadruple precision fused multiply-add (QPFMA) unit, a new QPFMA unit is designed and realized based on a 4-way SIMD device, which supports 64-bit×4 floating-point double precision fused multiply-add (DPFMA). The new QPFMA supports four kinds of FMA operations, multiplication, addition, subtraction and comparison, with the operation latency of 7 cycles. By decomposing the 113-bit×113-bit multiplication of quadruple precision fractions into four 57-bit×57-bit multiplications to share the 53-bit×53-bit multipliers of the 4-way SIMD DPFMA, the hardware cost of the new QPFMA is reduced sharply. After the new QPFMA is synthesized in 65nm cell library, the results show that it has significant advantages both in area and latency, with frequency at 1.1GHz, area 42.71% of a classic QPFMA unit, operation latency decreased by 3 cycles.","PeriodicalId":213334,"journal":{"name":"2013 IEEE Eighth International Conference on Networking, Architecture and Storage","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Eighth International Conference on Networking, Architecture and Storage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2013.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Aiming to decrease the hardware cost of floating-point quadruple precision fused multiply-add (QPFMA) unit, a new QPFMA unit is designed and realized based on a 4-way SIMD device, which supports 64-bit×4 floating-point double precision fused multiply-add (DPFMA). The new QPFMA supports four kinds of FMA operations, multiplication, addition, subtraction and comparison, with the operation latency of 7 cycles. By decomposing the 113-bit×113-bit multiplication of quadruple precision fractions into four 57-bit×57-bit multiplications to share the 53-bit×53-bit multipliers of the 4-way SIMD DPFMA, the hardware cost of the new QPFMA is reduced sharply. After the new QPFMA is synthesized in 65nm cell library, the results show that it has significant advantages both in area and latency, with frequency at 1.1GHz, area 42.71% of a classic QPFMA unit, operation latency decreased by 3 cycles.