REEL: Reducing effective execution latency of floating point operations

Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07) Pub Date : 2013-09-04 DOI:10.1109/ISLPED.2013.6629292

Vignyan Reddy Kothinti Naresh, S. Gilani, Erika Gunadi, N. Kim, M. Schulte, Mikko H. Lipasti

{"title":"REEL: Reducing effective execution latency of floating point operations","authors":"Vignyan Reddy Kothinti Naresh, S. Gilani, Erika Gunadi, N. Kim, M. Schulte, Mikko H. Lipasti","doi":"10.1109/ISLPED.2013.6629292","DOIUrl":null,"url":null,"abstract":"The height of the dynamic dependence graph of a program, as executed by a processor, determines the minimum bound on the execution time. This height can be decreased by reducing the effective execution latency of operations that form dependence chains in the graph. In this paper, we propose a technique called REEL to reduce overall latency of chains of dependent floating point (FP) operations by increasing the throughput of computation. REEL comprises of a high-throughput floating point unit (HFP) that allows early issue of an FP Add that is dependent on another FP Add or FP Multiply. This is complemented by instruction scheduler modifications that allow early issue of dependent FP Adds, and a novel checker logic that corrects any precision errors. Unlike conventional static operation fusion, like fused Multiply-Add (FMA), there are no changes to the instruction set to enable utilization of the new hardware, and no recompilation is necessary. Furthermore, unlike ISA-level FMA, our technique produces results that are bit compatible while boosting performance of Add-Add dependence pairs in addition to Multiply-Add pairs. Our evaluation of REEL using CFP2006 benchmarks shows an average performance gain of 7.6% and maximum performance gain of 17% while consuming 1.2% lower energy.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":"75 1","pages":"187-192"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISLPED.2013.6629292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The height of the dynamic dependence graph of a program, as executed by a processor, determines the minimum bound on the execution time. This height can be decreased by reducing the effective execution latency of operations that form dependence chains in the graph. In this paper, we propose a technique called REEL to reduce overall latency of chains of dependent floating point (FP) operations by increasing the throughput of computation. REEL comprises of a high-throughput floating point unit (HFP) that allows early issue of an FP Add that is dependent on another FP Add or FP Multiply. This is complemented by instruction scheduler modifications that allow early issue of dependent FP Adds, and a novel checker logic that corrects any precision errors. Unlike conventional static operation fusion, like fused Multiply-Add (FMA), there are no changes to the instruction set to enable utilization of the new hardware, and no recompilation is necessary. Furthermore, unlike ISA-level FMA, our technique produces results that are bit compatible while boosting performance of Add-Add dependence pairs in addition to Multiply-Add pairs. Our evaluation of REEL using CFP2006 benchmarks shows an average performance gain of 7.6% and maximum performance gain of 17% while consuming 1.2% lower energy.

查看原文本刊更多论文

REEL:减少浮点操作的有效执行延迟

由处理器执行的程序的动态依赖图的高度决定了执行时间的最小界限。这个高度可以通过减少在图中形成依赖链的操作的有效执行延迟来降低。在本文中，我们提出了一种称为REEL的技术，通过增加计算吞吐量来减少依赖浮点(FP)操作链的总体延迟。REEL由一个高吞吐量浮点单元(HFP)组成，它允许早期发布一个依赖于另一个FP Add或FP Multiply的FP Add。这是由指令调度器修改的补充，允许早期发布依赖的FP add，以及一种新的检查器逻辑，可以纠正任何精度错误。与传统的静态操作融合(如融合乘法-加法(FMA))不同，不需要更改指令集来启用新硬件，也不需要重新编译。此外，与isa级FMA不同，我们的技术产生的结果是位兼容的，同时提高了除乘法-加法对之外的加法依赖对的性能。我们使用CFP2006基准测试对REEL进行的评估显示，平均性能提高了7.6%，最大性能提高了17%，同时能耗降低了1.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)

自引率

0.00%

发文量