{"title":"An efficient software implementation of correctly rounded operations extending FMA: A + b + c and a × b + c × d","authors":"C. Lauter","doi":"10.1109/ACSSC.2017.8335379","DOIUrl":null,"url":null,"abstract":"In its 2008 revision, the IEEE754 Standard for Floating-Point Arithmetic added the Fused-Multiply-And-Add (FMA) operation, computing a × b + c without intermediate rounding. This operation enables faster scalar products and doubled-precision arithmetic. The IEEE754 Standard is again undergoing revision. We propose an efficient software implementation of two additional operations: Fused-Multiply-Twice-And-Add, a × b + c × d and Fused-Add-Add a + b + c. Our implementation guarantees correct rounding in all rounding modes and IEEE754 compliant signaling. Although intended for reference purposes, with a 94 resp. 104 cycle latency, our software implementations are pretty fast.","PeriodicalId":296208,"journal":{"name":"2017 51st Asilomar Conference on Signals, Systems, and Computers","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 51st Asilomar Conference on Signals, Systems, and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSSC.2017.8335379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In its 2008 revision, the IEEE754 Standard for Floating-Point Arithmetic added the Fused-Multiply-And-Add (FMA) operation, computing a × b + c without intermediate rounding. This operation enables faster scalar products and doubled-precision arithmetic. The IEEE754 Standard is again undergoing revision. We propose an efficient software implementation of two additional operations: Fused-Multiply-Twice-And-Add, a × b + c × d and Fused-Add-Add a + b + c. Our implementation guarantees correct rounding in all rounding modes and IEEE754 compliant signaling. Although intended for reference purposes, with a 94 resp. 104 cycle latency, our software implementations are pretty fast.
在2008年的修订版中,IEEE754浮点运算标准增加了融合乘法和加法(FMA)运算,计算a × b + c时不需要中间舍入。此操作支持更快的标量积和双精度算术。IEEE754标准再次进行修订。我们提出了一种有效的软件实现两种额外的操作:fusion - multiply - two - and - add, a × b + c × d和fusion - add - add a + b + c。我们的实现保证了在所有舍入模式下的正确舍入和符合IEEE754的信令。虽然仅供参考,但有94项规定。104个周期的延迟,我们的软件实现非常快。