2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)最新文献

Proceedings of the 25th International Symposium on Computer Arithmetic 第25届计算机算术国际研讨会论文集

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/arith.2018.8464697

引用次数: 0

A Formally-Proved Algorithm to Compute the Correct Average of Decimal Floating-Point Numbers 计算十进制浮点数正确平均值的正式证明算法

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464761

S. Boldo, Florian Faissole, Vincent Tourneur

引用次数: 2

Karatsuba with Rectangular Multipliers for FPGAs 用于fpga的矩形乘法器

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464809

M. Kumm, O. Gustafsson, F. D. Dinechin, Johannes Kappauf, P. Zipf

引用次数: 10

Digit Elision for Arbitrary-accuracy Iterative Computation 任意精度迭代计算中的数字省略

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464691

He Li, James J. Davis, John Wickerson, G. Constantinides

{"title":"Digit Elision for Arbitrary-accuracy Iterative Computation","authors":"He Li, James J. Davis, John Wickerson, G. Constantinides","doi":"10.1109/ARITH.2018.8464691","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464691","url":null,"abstract":"We recently proposed the first hardware architecture enabling the iterative solution of systems of linear equations to accuracies limited only by the amount of available memory. This technique, named ARCHITECT, achieves exact numeric computation by using online arithmetic to allow the refinement of results from earlier iterations over time, eschewing rounding error. ARCHITECT has a key drawback, however: often, many more digits than strictly necessary are generated, with this problem exacerbating the more accurate a solution is sought. In this paper, we infer the locations of these superfluous digits within stationary iterative calculations by exploiting online arithmetic's digit dependencies and using forward error analysis. We demonstrate that their lack of computation is guaranteed not to affect the ability to reach a solution of any accuracy. Versus ARCHITECT, our illustrative hardware implementation achieves a geometric mean 20.1× speedup in the solution of a set of representative linear systems through the avoidance of redundant digit calculation. For the computation of high-precision results, we also obtain an up-to 22.4times× memory requirement reduction over the same baseline. Finally, we demonstrate that solvers implemented following our proposals can show superiority over conventional arithmetic implementations by virtue of their runtime-tunable precisions.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"6 1","pages":"107-114"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86552219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

VeriTracer: Context-enriched tracer for floating-point arithmetic analysis VeriTracer:用于浮点算术分析的上下文丰富的跟踪程序

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464687

Yohan Chatelain, P. D. O. Castro, E. Petit, D. Defour, J. Bieder, M. Torrent

引用次数: 8

High Density and Performance Multiplication for FPGA FPGA的高密度和高性能乘法

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464695

M. Langhammer, Gregg Baeckler

{"title":"High Density and Performance Multiplication for FPGA","authors":"M. Langhammer, Gregg Baeckler","doi":"10.1109/ARITH.2018.8464695","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464695","url":null,"abstract":"Arithmetic based applications are one of the most common use cases for modern FPGAs. Currently, machine learning is emerging as the fastest growth area for FPG As, renewing an interest in low precision multiplication. There is now a new focus on multiplication in the soft fabric - very high-density systems, consisting of many thousands of operations, are the current norm. In this paper we introduce multiplier regularization, which restructures common multiplier algorithms into smaller, and more efficient architectures. The multiplier structure is parameterizable, and results are given for a continuous range of input sizes, although the algorithm is most efficient for small input precisions. The multiplier is particularly effective for typical machine learning inferencing uses, and the presented cores can be used for dot products required for these applications. Although the examples presented here are optimized for Intel Stratix 10 devices, the concept of regularized arithmetic structures are applicable to generic FPGA LUT architectures. Results are compared to Intel Megafunction IP as well as contrasted with normalized representations of recently published results for Xilinx devices. We report a 10% to 35% smaller area, and a more significant latency reduction, in the range of 25% to 50%, for typical inferencing use cases.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"421 1","pages":"5-12"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72713518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Flexpoint: Predictive Numerics for Deep Learning Flexpoint:深度学习的预测数字

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464801

Valentina Popescu, M. Nassar, Xin Wang, E. Tumer, T. Webb

{"title":"Flexpoint: Predictive Numerics for Deep Learning","authors":"Valentina Popescu, M. Nassar, Xin Wang, E. Tumer, T. Webb","doi":"10.1109/ARITH.2018.8464801","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464801","url":null,"abstract":"Deep learning has been undergoing rapid growth in recent years thanks to its state-of-the-art performance across a wide range of real-world applications. Traditionally neural networks were trained in IEEE-754 binary64 or binary32 format, a common practice in general scientific computing. However, the unique computational requirements of deep neural network training workloads allow for much more efficient and inexpensive alternatives, unleashing a new wave of numerical innovations powering specialized computing hardware. We previously presented Flexpoint, a blocked fixed-point data type combined with a novel predictive exponent management algorithm designed to support training of deep networks without modifications, aiming at a seamless replacement of the binary32 widely in practice today. We showed that Flexpoint with 16-bit mantissa and 5-bit shared exponent (flex16+S) achieved numerical parity to binary32 in training a number of convolutional neural networks. In the current paper we review the continuing trend of predictive numerics enhancing deep neural network training in specialized computing devices such as the Intel®N ervana ™ Neural Network Processor.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"118 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73082788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Combining Restoring Array and Logarithmic Dividers into an Approximate Hybrid Design 将恢复阵列和对数分频器组合成近似混合设计

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464807

Weiqiang Liu, Jing Li, Tao Xu, Chenghua Wang, P. Montuschi, F. Lombardi

引用次数: 18

Approximate Fixed-Point Elementary Function Accelerator for the SpiNNaker-2 Neuromorphic Chip SpiNNaker-2神经形态芯片的近似定点初等函数加速器

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464785

M. Mikaitis, D. Lester, D. Shang, S. Furber, Gengting Liu, J. Garside, Stefan Scholze, S. Höppner, Andreas Dixius

{"title":"Approximate Fixed-Point Elementary Function Accelerator for the SpiNNaker-2 Neuromorphic Chip","authors":"M. Mikaitis, D. Lester, D. Shang, S. Furber, Gengting Liu, J. Garside, Stefan Scholze, S. Höppner, Andreas Dixius","doi":"10.1109/ARITH.2018.8464785","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464785","url":null,"abstract":"Neuromorphic chips are used to model biologically inspired Spiking-Neural-Networks(SNNs) where most models are based on differential equations. Equations for most SNN algorithms usually contain variables with one or more $e^{x}$ components. SpiNNaker is a digital neuromorphic chip that has so far been using pre-calculated look-up tables for exponential function. However this approach is limited because the memory requirements grow as more complex neural models are developed. To save already limited memory resources in the next generation SpiNNaker chip, we are including a fast exponential function in the silicon. In this paper we analyse iterative algorithms for elementary functions and show how to build a single hardware accelerator for exp and natural log, for a neuromorphic chip prototype, to be manufactured in a 22 nm FDSOI process. We present the accelerator that has algorithmic level approximation control, allowing it to trade precision for latency and energy efficiency. As an addition to neuromorphic chip application, we provide analysis of a parameterized elementary function unit that can be tailored for other systems with different power, area, accuracy and latency constraints.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"16 1","pages":"37-44"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81908145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

New Area Record for the AES Combined S-Box/Inverse S-Box AES组合s盒/逆s盒的新区域记录

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464780

A. Reyhani-Masoleh, Mostafa M. I. Taha, Doaa Ashmawy

引用次数: 12