A General-Purpose Method for Faithfully Rounded Floating-Point Function Approximation in FPGAs

2015 IEEE 22nd Symposium on Computer Arithmetic Pub Date : 2015-06-22 DOI:10.1109/ARITH.2015.27

David B. Thomas

{"title":"A General-Purpose Method for Faithfully Rounded Floating-Point Function Approximation in FPGAs","authors":"David B. Thomas","doi":"10.1109/ARITH.2015.27","DOIUrl":null,"url":null,"abstract":"A barrier to wide-spread use of Field Programmable Gate Arrays (FPGAs) has been the complexity of programming, but recent advances in High-Level Synthesis (HLS) have made it possible for non-experts to easily create floating-point numerical accelerators from C-like code. However, HLS users are limited to the set of numerical primitives provided by HLS vendors and designers of floating-point IP cores, and cannot easily implement new fast or accurate numerical primitives. This paper presents a method for automatically creating high-performance pipelined floating-point function approximations, which can be integrated as IP cores into numerical accelerators, whether derived from HLS or traditional design methods. Both input and output are floating-point, but internally the function approximator uses fixed-point polynomial segments, guaranteeing a faithfully rounded output. A robust and automated non-uniform segmentation scheme is used to segment any twice-differentiable input function and produce platform-independent VHDL. The approach is demonstrated across ten functions, which are automatically generated then placed and routed in Xilinx devices. The method provides a 1.1x-3x improvement in area over composite numerical approximations, while providing similar performance and significantly better relative error.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"13 1","pages":"42-49"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 22nd Symposium on Computer Arithmetic","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARITH.2015.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

A barrier to wide-spread use of Field Programmable Gate Arrays (FPGAs) has been the complexity of programming, but recent advances in High-Level Synthesis (HLS) have made it possible for non-experts to easily create floating-point numerical accelerators from C-like code. However, HLS users are limited to the set of numerical primitives provided by HLS vendors and designers of floating-point IP cores, and cannot easily implement new fast or accurate numerical primitives. This paper presents a method for automatically creating high-performance pipelined floating-point function approximations, which can be integrated as IP cores into numerical accelerators, whether derived from HLS or traditional design methods. Both input and output are floating-point, but internally the function approximator uses fixed-point polynomial segments, guaranteeing a faithfully rounded output. A robust and automated non-uniform segmentation scheme is used to segment any twice-differentiable input function and produce platform-independent VHDL. The approach is demonstrated across ten functions, which are automatically generated then placed and routed in Xilinx devices. The method provides a 1.1x-3x improvement in area over composite numerical approximations, while providing similar performance and significantly better relative error.

查看原文本刊更多论文

fpga中忠实舍入浮点函数逼近的通用方法

广泛使用现场可编程门阵列(fpga)的一个障碍是编程的复杂性，但最近在高级综合(HLS)方面的进展使得非专业人员可以很容易地从类c代码创建浮点数值加速器。然而，HLS用户受到HLS供应商和浮点IP核设计人员提供的一组数字原语的限制，无法轻松实现新的快速或精确的数字原语。本文提出了一种自动创建高性能流水线浮点函数近似的方法，该方法可以作为IP核集成到数值加速器中，无论是源自HLS还是传统设计方法。输入和输出都是浮点数，但函数近似器内部使用定点多项式段，保证忠实地四舍五入输出。采用鲁棒、自动化的非均匀分割方案对任意二次可微输入函数进行分割，生成与平台无关的VHDL。该方法演示了十个功能，这些功能自动生成，然后在Xilinx设备中放置和路由。与复合数值近似相比，该方法的面积提高了1.1 -3倍，同时提供了相似的性能和明显更好的相对误差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 22nd Symposium on Computer Arithmetic

自引率

0.00%

发文量