{"title":"Architecture Exploration of High-Performance Floating-Point Fused Multiply-Add Units and their Automatic Use in High-Level Synthesis","authors":"B. Liebig, Jens Huthmann, A. Koch","doi":"10.1109/IPDPSW.2013.106","DOIUrl":null,"url":null,"abstract":"Multiply-add operations form a crucial part of many digital signal processing and control engineering applications. Since their performance is crucial for the application-level speed-up, it is worthwhile to explore a wide spectrum of implementations alternatives, trading increased area/energy usage to speed-up units on the critical path of the computation. This paper examines existing solutions and proposes two new architectures for floating-point fused multiply-adds, and also considers the impact of different in-fabric features of recent FPGA architectures. The units rely on different degrees of carry-save arithmetic improve performance by up to 2.5x over the closest state-of-the-art competitor. They are evaluated at the application level by modifying an existing high-level synthesis system to automatically insert the new units for computations on the critical path of three different convex solvers.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2013.106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Multiply-add operations form a crucial part of many digital signal processing and control engineering applications. Since their performance is crucial for the application-level speed-up, it is worthwhile to explore a wide spectrum of implementations alternatives, trading increased area/energy usage to speed-up units on the critical path of the computation. This paper examines existing solutions and proposes two new architectures for floating-point fused multiply-adds, and also considers the impact of different in-fabric features of recent FPGA architectures. The units rely on different degrees of carry-save arithmetic improve performance by up to 2.5x over the closest state-of-the-art competitor. They are evaluated at the application level by modifying an existing high-level synthesis system to automatically insert the new units for computations on the critical path of three different convex solvers.