Fused Multiply-Add Microarchitecture Comprising Separate Early-Normalizing Multiply and Add Pipelines

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI:10.1109/ARITH.2011.25

D. Lutz

引用次数: 23

Abstract

We present an IEEE 754-2008 and ARM compliant floating-point micro architecture that preserves the higher performance of separate multiply and add units while decreasing the effective latency of fused multiply-adds (FMAs). The multiplier supports subnormals in a novel and faster manner, shifting the partial products so that injection rounding can be used. The early-normalizing adder retains the low latency of a split path near/far adder, but does so in a unified path with less area. The adder also allows rounding on effective subtractions involving one input that is twice the normal width, a necessary feature for handling FMAs. The resulting floating-point unit has about twice the (IPC) performance of the best previous ARM design, and can be clocked at a higher speed despite the wider paths required by FMAs.

查看原文本刊更多论文

融合乘加微架构，包括独立的早期归一化乘加管道

我们提出了一种符合IEEE 754-2008和ARM标准的浮点微架构，该架构既保留了独立乘法和加法单元的更高性能，又降低了融合乘法和加法(fma)的有效延迟。乘法器以一种新颖的、更快的方式支持次法线，移动部分乘积，从而可以使用注入舍入。早期归一化加法器保留了分割路径近/远加法器的低延迟，但在面积较小的统一路径上实现了这一点。加法器还允许对有效减法进行舍入，其中一个输入是正常宽度的两倍，这是处理fma的必要功能。由此产生的浮点单元的IPC性能大约是以前最好的ARM设计的两倍，尽管fma需要更宽的路径，但可以以更高的速度进行时钟处理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 20th Symposium on Computer Arithmetic

自引率

0.00%

发文量