{"title":"Optimized design of a double-precision floating-point multiply-add-dused unit for data dependence","authors":"Gongqiong Li, Zhaolin Li","doi":"10.1109/ICCD.2007.4601918","DOIUrl":null,"url":null,"abstract":"This paper presents a novel double-precision floating-point multiply-add-fused unit, which is implemented in three pipeline stages. The main improvement over the conventional design is data dependence between two consecutive floating-point instructions is considered. In the new design the intermediate computation results of the first floating-point instruction are first pretreated and then fed back to the first stage for being directly used by the second floating-point instruction if the two consecutive floating-point instructions are data dependent. In this way, floating point instructions can be executed directly following their preceding floating-point instructions without being stalled due to data dependence. 11 data dependence cases are accelerated in this paper. The experiments, which are done over four SPEC2000 benchmark programs, show that 25% performance increase can be attained at the cost of 0.27 ns time delay added to the critical path.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"32 1","pages":"311-316"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 25th International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2007.4601918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a novel double-precision floating-point multiply-add-fused unit, which is implemented in three pipeline stages. The main improvement over the conventional design is data dependence between two consecutive floating-point instructions is considered. In the new design the intermediate computation results of the first floating-point instruction are first pretreated and then fed back to the first stage for being directly used by the second floating-point instruction if the two consecutive floating-point instructions are data dependent. In this way, floating point instructions can be executed directly following their preceding floating-point instructions without being stalled due to data dependence. 11 data dependence cases are accelerated in this paper. The experiments, which are done over four SPEC2000 benchmark programs, show that 25% performance increase can be attained at the cost of 0.27 ns time delay added to the critical path.