Improved Architectures for a Floating-Point Fused Dot Product Unit

2013 IEEE 21st Symposium on Computer Arithmetic Pub Date : 2013-04-07 DOI:10.1109/ARITH.2013.26

Jongwook Sohn, E. Swartzlander

引用次数: 42

Abstract

This paper presents improved architectures for a floating-point fused two-term dot product unit. The floating-point fused dot product unit is useful for a wide variety of digital signal processing (DSP) applications including complex multiplication and fast Fourier transform (FFT) and discrete cosine transform (DCT) butterfly operations. In order to improve the performance, a new alignment scheme, early normalization, a four-input leading zero anticipation (LZA), a dual-path algorithm, and pipelining are applied. The proposed designs are implemented for single precision and synthesized with a 45nm standard cell library. The proposed dual-path design reduces the latency by 25% compared to the traditional floating-point fused dot product unit. Based on a data flow analysis, the proposed design can be split into three pipeline stages. Since the latencies of the three stages are fairly well balanced, the throughput is increased by a factor of 2.8 compared to the non-pipelined dual-path design.

查看原文本刊更多论文

浮点融合点积单元的改进架构

提出了一种改进的浮点融合两项点积单元结构。浮点融合点积单元可用于各种数字信号处理(DSP)应用，包括复杂乘法和快速傅立叶变换(FFT)和离散余弦变换(DCT)蝴蝶运算。为了提高性能，采用了一种新的对齐方案、早期归一化、四输入前导零预判(LZA)、双路径算法和流水线。所提出的设计实现了单精度，并与45nm标准细胞库合成。与传统的浮点融合点积单元相比，所提出的双路径设计将延迟降低了25%。基于数据流分析，提出的设计可分为三个管道阶段。由于三个阶段的延迟相当平衡，因此与非流水线双路径设计相比，吞吐量增加了2.8倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE 21st Symposium on Computer Arithmetic

自引率

0.00%

发文量