2015 IEEE 22nd Symposium on Computer Arithmetic最新文献_第2页

Low-Cost Duplicate Multiplication 低成本重复乘法

2015 IEEE 22nd Symposium on Computer Arithmetic Pub Date : 2015-06-22 DOI: 10.1109/ARITH.2015.29

Michael B. Sullivan, E. Swartzlander

引用次数: 2

Reliable Evaluation of the Worst-Case Peak Gain Matrix in Multiple Precision 多精度下最坏情况峰值增益矩阵的可靠评估

2015 IEEE 22nd Symposium on Computer Arithmetic Pub Date : 2015-06-22 DOI: 10.1109/ARITH.2015.14

Anastasia Volkova, Thibault Hilaire, C. Lauter

引用次数: 17

The Exact Real Arithmetical Algorithm in Binary Continued Fractions 二元连分数的精确实数算法

2015 IEEE 22nd Symposium on Computer Arithmetic Pub Date : 2015-06-22 DOI: 10.1109/ARITH.2015.20

P. Kurka

引用次数: 2

Efficient Divide-and-Conquer Multiprecision Integer Division 高效分治多精度整数除法

2015 IEEE 22nd Symposium on Computer Arithmetic Pub Date : 2015-06-22 DOI: 10.1109/ARITH.2015.19

William Bruce Hart

引用次数: 1

Semi-Automatic Floating-Point Implementation of Special Functions 特殊函数的半自动浮点实现

2015 IEEE 22nd Symposium on Computer Arithmetic Pub Date : 2015-06-22 DOI: 10.1109/ARITH.2015.12

C. Lauter, M. Mezzarobba

{"title":"Semi-Automatic Floating-Point Implementation of Special Functions","authors":"C. Lauter, M. Mezzarobba","doi":"10.1109/ARITH.2015.12","DOIUrl":"https://doi.org/10.1109/ARITH.2015.12","url":null,"abstract":"This work introduces an approach to the computer-assisted implementation of mathematical functions geared toward special functions such as those occurring in mathematical physics. The general idea is to start with an exact symbolic representation of a function and automate as much as possible of the process of implementing it. In order to deal with a large class of special functions, our symbolic representation is an implicit one: the input is a linear differential equation with polynomial coefficients along with initial values. The output is a C program to evaluate the solution of the equation using domain splitting, argument reduction and polynomial approximations in double-precision arithmetic, in the usual style of mathematical libraries. Our generation method combines symbolic-numeric manipulations of linear ODEs with interval-based tools for the floating-point implementation of \"black-box\" functions. We describe a prototype code generator that can automatically produce implementations on moderately large intervals. Implementations on the whole real line are possible in some cases but require manual tool setup and code integration. Due to this limitation and as some heuristics remain, we refer to our method as \"semi-automatic\" at this stage. Along with other examples, we present an implementation of the Voigt profile with fixed parameters that may be of independent interest.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"93 1","pages":"58-65"},"PeriodicalIF":0.0,"publicationDate":"2015-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84167115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Modulo-(2^n -- 2^q -- 1) Parallel Prefix Addition via Excess-Modulo Encoding of Residues 残数的过模编码的模-(2^n—2^q—1)并行前缀加法

2015 IEEE 22nd Symposium on Computer Arithmetic Pub Date : 2015-06-22 DOI: 10.1109/ARITH.2015.9

Seyed Hamed Fatemi Langroudi, G. Jaberipur

{"title":"Modulo-(2^n -- 2^q -- 1) Parallel Prefix Addition via Excess-Modulo Encoding of Residues","authors":"Seyed Hamed Fatemi Langroudi, G. Jaberipur","doi":"10.1109/ARITH.2015.9","DOIUrl":"https://doi.org/10.1109/ARITH.2015.9","url":null,"abstract":"The residue number system t = {2n - 1, 2n, 2n + 1} has been extensively studied towards perfection in realization of efficient parallel prefix modular adders, with (3 + 2logn △G latency. Many applications, such as digital signal processing require fast modular operations. However, relying only on t limits the magnitude of n, and accordingly the dynamic range. Therefore, additional mutually prime moduli are required to accommodate for wider dynamic range. On the other hand, speed of modular arithmetic operations for the additional moduli should be as close as possible to those in t. This could be best met by the moduli of the form 2n - (2q + 1), with 1 ≤ q ≤ n - 2, such as 2n - 3, 2n - 5. However, the fastest parallel prefix realization of modulo-(2n - 2q - 1) adders that we have encountered in the relevant literature, claims (7 + 2 log n)△G latency. Motivated by the need to reduce the latter, we propose new designs of such adders with (5 + 2 log n)△G latency without any penalty in area consumption or power dissipation. The proposed modular addition algorithm entails supplementary representation of residues in [0,2q], as [2n - (2q + 1), 2n - 1]. This leads to additional performance efficiency similar to the effect of double zero representation in modulo-(2n - 1) adders. The aforementioned analytically evaluated speed gain and improvements in other figures of merit are also supported via circuit simulation and synthesis.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"2 1","pages":"121-128"},"PeriodicalIF":0.0,"publicationDate":"2015-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83098151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Design and Implementation of an Embedded FPGA Floating Point DSP Block 嵌入式FPGA浮点DSP模块的设计与实现

2015 IEEE 22nd Symposium on Computer Arithmetic Pub Date : 2015-06-22 DOI: 10.1109/ARITH.2015.18

M. Langhammer, B. Pasca

{"title":"Design and Implementation of an Embedded FPGA Floating Point DSP Block","authors":"M. Langhammer, B. Pasca","doi":"10.1109/ARITH.2015.18","DOIUrl":"https://doi.org/10.1109/ARITH.2015.18","url":null,"abstract":"This paper describes the architecture and implementation, from both the standpoint of target applications as well as circuit design, of an FPGA DSP Block that can efficiently support both fixed and single precision (SP) floating-point (FP) arithmetic. Most contemporary FPGAs embed DSP blocks that provide simple multiply-add-based fixed-point arithmetic cores. Current FP arithmetic FPGA solutions make use of these hardened DSP resources, together with embedded memory blocks and soft logic resources, however, larger systems cannot be efficiently implemented due to the routing and soft logic limitations on the devices, resulting in significant area, performance, and power consumption penalties compared to ASIC implementations. In this paper we analyse earlier proposed embedded FP implementations, and show why they are not suitable for a production FPGA. We contrast these against our solution -- a unified DSP Block -- where (a) the SP FP multiplier is overlaid on the fixed point constructs, (b) the SP FP Adder/Subtracter is integrated as a separate unit, and (c) the multiplier and adder can be combined in a way that is both arithmetically useful, but also efficient in terms of FPGA routing density and congestion. In addition, a novel way of seamlessly combining any number of DSP Blocks in a low latency structure will be introduced. We will show that this new approach allows a low cost, low power, and high density FP platform on current production 20nm FPGAs. We also describe a future enhancement of the DSP block that can support subnormal numbers.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"128 1","pages":"26-33"},"PeriodicalIF":0.0,"publicationDate":"2015-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87912391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

The end of numerical error 结束数值误差

2015 IEEE 22nd Symposium on Computer Arithmetic Pub Date : 2015-06-22 DOI: 10.1109/ARITH.2015.34

J. Gustafson

{"title":"The end of numerical error","authors":"J. Gustafson","doi":"10.1109/ARITH.2015.34","DOIUrl":"https://doi.org/10.1109/ARITH.2015.34","url":null,"abstract":"Summary form only given, as follows. The full paper was not made available as part of this conference proceedings. It is time to overthrow a century of methods based on floating point arithmetic. Current technical computing is based on the acceptance of rounding error using numerical representations that were invented in 1914, and acceptance of sampling error using algorithms designed for a time when transistors were very expensive. By sticking to an antiquated storage format (now codified as an IEEE standard) well into the exascale area, we are wasting power, energy, storage, bandwidth, and programmer effort. The pursuit of exascale floating point is ridiculous, since we do not need to be making 10^18 sloppy rounding errors per second; we need instead to get provable, valid results for the first time, by turning the speed of parallel computers into higher quality answers instead of more junk per second. We introduce the 'unum' (universal number), a superset of IEEE Floating Point, that contains extra metadata fields that actually save storage, yet give more accurate answers that do not round, overflow, or underflow. The potential they offer for improved programmer productivity is enormous. They also provide, for the first time, the hope of a numerical standard that guarantees bitwise identical results across different computer architectures. Unum format is the basis for the 'ubox' method, which redefines what is meant by \"high performance\" by measuring performance in terms of the knowledge obtained about the answer and not the operations performed per second. Examples are given for practical application to structural analysis, radiation transfer, the n-body problem, linear and nonlinear systems of equations, and Laplace’s equation. This is a fresh approach to scientific computing that allows proper, rigorous representation of real number sets for the first time.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"35 1","pages":"74"},"PeriodicalIF":0.0,"publicationDate":"2015-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89332574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Reproducible Tall-Skinny QR 可重复的高瘦QR

2015 IEEE 22nd Symposium on Computer Arithmetic Pub Date : 2015-06-22 DOI: 10.1109/ARITH.2015.28

Hong Diep Nguyen, J. Demmel

{"title":"Reproducible Tall-Skinny QR","authors":"Hong Diep Nguyen, J. Demmel","doi":"10.1109/ARITH.2015.28","DOIUrl":"https://doi.org/10.1109/ARITH.2015.28","url":null,"abstract":"Reproducibility is the ability to obtain bitwise identical results from different runs of the same program on the same input data, regardless of the available computing resources, or how they are scheduled. Recently, techniques have been proposed to attain reproducibility for BLAS operations, all of which rely on reproducibly computing the floating-point sum and dot product. Nonetheless, a reproducible BLAS library does not automatically translate into a reproducible higher-level linear algebra library, especially when communication is optimized. For instance, for the QR factorization, conventional algorithms such as Householder transformation or Gram-Schmidt process can be used to reproducibly factorize a floating-point matrix by fixing the high-level order of computation, for example column-by-column from left to right, and by using reproducible versions of level-1 BLAS operations such as dot product and 2-norm. In a massively parallel environment, those algorithms have high communication cost due to the need for synchronization after each step. The Tall-Skinny QR algorithm obtains much better performance in massively parallel environments by reducing the number of messages by a factor of n to O(log(P)) where P is the processor count, by reducing the number of reduction operations to O(1). Those reduction operations however are highly dependent on the network topology, in particular the number of computing nodes, and therefore are difficult to implement reproducibly and with reasonable performance. In this paper we present a new technique to reproducibly compute a QR factorization for a tall skinny matrix, which is based on the Cholesky QR algorithm to attain reproducibility as well as to improve communication cost, and the iterative refinement technique to guarantee the accuracy of the computed results. Our technique exhibits strong scalability in massively parallel environments, and at the same time can provide results of almost the same accuracy as the conventional Householder QR algorithm unless the matrix is extremely badly conditioned, in which case a warning can be given. Initial experimental results in Matlab show that for not too ill-conditioned matrices whose condition number is smaller than sqrt(1/e) where e is the machine epsilon, our technique runs less than 4 times slower than the built-in Matlab qr() function, and always computes numerically stable results in terms of column-wise relative error.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"31 1","pages":"152-159"},"PeriodicalIF":0.0,"publicationDate":"2015-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85481966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Code Generators for Mathematical Functions 数学函数的代码生成器

2015 IEEE 22nd Symposium on Computer Arithmetic Pub Date : 2015-06-22 DOI: 10.1109/ARITH.2015.22

Nicolas Brunie, F. D. Dinechin, O. Kupriianova, C. Lauter

{"title":"Code Generators for Mathematical Functions","authors":"Nicolas Brunie, F. D. Dinechin, O. Kupriianova, C. Lauter","doi":"10.1109/ARITH.2015.22","DOIUrl":"https://doi.org/10.1109/ARITH.2015.22","url":null,"abstract":"A typical floating-point environment includes support for a small set of about 30 mathematical functions such as exponential, logarithm, trigonometric and hyperbolic functions. These functions are provided by mathematical software libraries (libm), typically in IEEE754 single, double and quad precision. This article suggests to replace this libm paradigm by a more general approach: the on-demand generation of numerical function code, on arbitrary domains and with arbitrary accuracies. First, such code generation opens up the libm function space available to programmers. It may capture a much wider set of functions, and may capture even standard functions on non-standard domains and accuracy/performance points. Second, writing libm code requires fine-tuned instruction selection and scheduling for performance, and sophisticated floating-point techniques for accuracy. Automating this task through code generation improves confidence in the code while enabling better design space exploration, and therefore better time to market, even for the libm functions. This article discusses the new challenges of this paradigm shift, and presents the current state of open-source function code generators available on http://www.metalibm.org/.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"12 1","pages":"66-73"},"PeriodicalIF":0.0,"publicationDate":"2015-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79130514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33