An integrated reduction technique for a double precision accumulator

Krishna K. Nagar, Yan Zhang, J. Bakos
{"title":"An integrated reduction technique for a double precision accumulator","authors":"Krishna K. Nagar, Yan Zhang, J. Bakos","doi":"10.1145/1646461.1646463","DOIUrl":null,"url":null,"abstract":"The accumulation operation, An+1 = An + X, is perhaps one of the most fundamental and widely-used operations in numerical mathematics and digital signal processing. However, designing double-precision floating-point accumulators presents a unique set of challenges: double-precision addition is usually deeply pipelined and without special micro-architectural or data scheduling techniques, the data hazard that exists between An+1 and An requires that each new value of X delivered to the accumulator wait for the latency of the adder. There have been several techniques proposed for alleviating this problem, but each carries significant overheads and/or restrictions on input characteristics. In this paper we present a design for a double precision accumulator that requires no timing overhead relative to the underlying add operation. We achieve this by integrating a coalescing reduction circuit within the low-level design of a base-converting floating-point adder. To demonstrate our accumulator design, we use it in a sparse matrix vector multiplication architecture, achieving a throughput of up to 3.7 GFLOPS.","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"29 1","pages":"11-18"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"高性能计算技术","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1145/1646461.1646463","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The accumulation operation, An+1 = An + X, is perhaps one of the most fundamental and widely-used operations in numerical mathematics and digital signal processing. However, designing double-precision floating-point accumulators presents a unique set of challenges: double-precision addition is usually deeply pipelined and without special micro-architectural or data scheduling techniques, the data hazard that exists between An+1 and An requires that each new value of X delivered to the accumulator wait for the latency of the adder. There have been several techniques proposed for alleviating this problem, but each carries significant overheads and/or restrictions on input characteristics. In this paper we present a design for a double precision accumulator that requires no timing overhead relative to the underlying add operation. We achieve this by integrating a coalescing reduction circuit within the low-level design of a base-converting floating-point adder. To demonstrate our accumulator design, we use it in a sparse matrix vector multiplication architecture, achieving a throughput of up to 3.7 GFLOPS.
双精度蓄能器的集成减容技术
累加运算An+1 = An+ X可能是数值数学和数字信号处理中最基本和最广泛使用的运算之一。然而,设计双精度浮点累加器带来了一系列独特的挑战:双精度加法通常是深度流水线的,没有特殊的微体系结构或数据调度技术,存在于An+1和An之间的数据危险要求每个传递给累加器的X的新值都要等待加法器的延迟。已经提出了几种技术来缓解这个问题,但每种技术都有很大的开销和/或对输入特性的限制。在本文中,我们提出了一种双精度累加器的设计,它不需要相对于底层加法操作的时间开销。我们通过在基础转换浮点加法器的底层设计中集成一个聚结减小电路来实现这一目标。为了演示我们的累加器设计,我们在稀疏矩阵矢量乘法架构中使用它,实现了高达3.7 GFLOPS的吞吐量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
1121
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信