{"title":"高性能浮点融合乘加单元的体系结构探索及其在高级综合中的自动应用","authors":"B. Liebig, Jens Huthmann, A. Koch","doi":"10.1109/IPDPSW.2013.106","DOIUrl":null,"url":null,"abstract":"Multiply-add operations form a crucial part of many digital signal processing and control engineering applications. Since their performance is crucial for the application-level speed-up, it is worthwhile to explore a wide spectrum of implementations alternatives, trading increased area/energy usage to speed-up units on the critical path of the computation. This paper examines existing solutions and proposes two new architectures for floating-point fused multiply-adds, and also considers the impact of different in-fabric features of recent FPGA architectures. The units rely on different degrees of carry-save arithmetic improve performance by up to 2.5x over the closest state-of-the-art competitor. They are evaluated at the application level by modifying an existing high-level synthesis system to automatically insert the new units for computations on the critical path of three different convex solvers.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Architecture Exploration of High-Performance Floating-Point Fused Multiply-Add Units and their Automatic Use in High-Level Synthesis\",\"authors\":\"B. Liebig, Jens Huthmann, A. Koch\",\"doi\":\"10.1109/IPDPSW.2013.106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiply-add operations form a crucial part of many digital signal processing and control engineering applications. Since their performance is crucial for the application-level speed-up, it is worthwhile to explore a wide spectrum of implementations alternatives, trading increased area/energy usage to speed-up units on the critical path of the computation. This paper examines existing solutions and proposes two new architectures for floating-point fused multiply-adds, and also considers the impact of different in-fabric features of recent FPGA architectures. The units rely on different degrees of carry-save arithmetic improve performance by up to 2.5x over the closest state-of-the-art competitor. They are evaluated at the application level by modifying an existing high-level synthesis system to automatically insert the new units for computations on the critical path of three different convex solvers.\",\"PeriodicalId\":234552,\"journal\":{\"name\":\"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2013.106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2013.106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Architecture Exploration of High-Performance Floating-Point Fused Multiply-Add Units and their Automatic Use in High-Level Synthesis
Multiply-add operations form a crucial part of many digital signal processing and control engineering applications. Since their performance is crucial for the application-level speed-up, it is worthwhile to explore a wide spectrum of implementations alternatives, trading increased area/energy usage to speed-up units on the critical path of the computation. This paper examines existing solutions and proposes two new architectures for floating-point fused multiply-adds, and also considers the impact of different in-fabric features of recent FPGA architectures. The units rely on different degrees of carry-save arithmetic improve performance by up to 2.5x over the closest state-of-the-art competitor. They are evaluated at the application level by modifying an existing high-level synthesis system to automatically insert the new units for computations on the critical path of three different convex solvers.