2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)最新文献

筛选
英文 中文
Augmented Arithmetic Operations Proposed for IEEE-754 2018 针对IEEE-754 2018提出的增广算术运算
2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464813
Jason Riedy, J. Demmel
{"title":"Augmented Arithmetic Operations Proposed for IEEE-754 2018","authors":"Jason Riedy, J. Demmel","doi":"10.1109/ARITH.2018.8464813","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464813","url":null,"abstract":"Algorithms for extending arithmetic precision through compensated summation or arithmetics like double-double rely on operations commonly called twoSum and twoProd-uct. The current draft of the IEEE 754 standard specifies these operations under the names augmentedAddition and augment-edMultiplication. These operations were included after three decades of experience because of a motivating new use: bitwise reproducible arithmetic. Standardizing the operations provides a hardware acceleration target that can provide at least a 33 % speed improvements in reproducible dot product, placing reproducible dot product almost within a factor of two of common dot product. This paper provides history and motivation for standardizing these operations. We also define the operations, explain the rationale for all the specific choices, and provide parameterized test cases for new boundary behaviors.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"286 1","pages":"45-52"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73257723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
On Various Ways to Split a Floating-Point Number 浮点数分割的各种方法
2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464793
C. Jeannerod, J. Muller, P. Zimmermann
{"title":"On Various Ways to Split a Floating-Point Number","authors":"C. Jeannerod, J. Muller, P. Zimmermann","doi":"10.1109/ARITH.2018.8464793","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464793","url":null,"abstract":"We review several ways to split a floating-point number, that is, to decompose it into the exact sum of two floating-point numbers of smaller precision. All the methods considered here involve only a few IEEE floating-point operations, with rounding to nearest and including possibly the fused multiply -add (FMA). Applications range from the implementation of integer functions such as round and floor to the computation of suitable scaling factors aimed, for example, at avoiding spurious underflows and overflows when implementing functions such as the hypotenuse.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"42 1","pages":"53-60"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72779520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A New Variant of the Barrett Algorithm Applied to Quotient Selection 应用于商选择的Barrett算法的一种新变体
2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464771
Niall Emmart, Fangyu Zheng, C. Weems
{"title":"A New Variant of the Barrett Algorithm Applied to Quotient Selection","authors":"Niall Emmart, Fangyu Zheng, C. Weems","doi":"10.1109/ARITH.2018.8464771","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464771","url":null,"abstract":"Quotient Selection (QS) is a key step in the classic $O(n^{2}$) multiple precision division algorithm. On processors with fast hardware division, it is a trivial problem, but on GPUs, division is quite slow. In this paper we investigate the effectiveness of Brent and Zimmermann's variant as well as our own novel variant of Barrett's algorithm. Our new approach is shown to be suitable for low radix (single precision) QS. Three highly optimized implementations, two of the Brent and Zimmerman variant and one based on our new approach, have been developed and we show that each is many times faster than using the division operation built in to the compiler. In addition, our variant is on average 22 % faster than the other two implementations. We also sketch proofs of correctness for all of the implementations and our new algorithm.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"18 1","pages":"138-144"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77231484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High Throughput Polynomial and Rational Function Approximations Evaluator 一个高通量多项式和有理函数近似求值器
2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464778
N. Brisebarre, G. Constantinides, Milos Ercezovac, Silviu-Ioan Filip, Matei Iştoan, J. Muller
{"title":"A High Throughput Polynomial and Rational Function Approximations Evaluator","authors":"N. Brisebarre, G. Constantinides, Milos Ercezovac, Silviu-Ioan Filip, Matei Iştoan, J. Muller","doi":"10.1109/ARITH.2018.8464778","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464778","url":null,"abstract":"We present an automatic method for the evaluation of functions via polynomial or rational approximations and its hardware implementation, on FPGAs. These approximations are evaluated using Ercegovac's iterative E-method adapted for FPGA implementation. The polynomial and rational function coefficients are optimized such that they satisfy the constraints of the E-method. We present several examples of practical interest; in each case a resource-efficient approximation is proposed and comparisons are made with alternative approaches.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"63 1","pages":"99-106"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84311677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Enhanced Vector Math Support on the Intel®AVX-512 Architecture 增强的矢量数学对英特尔®AVX-512架构的支持
2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464794
Cristina S. Anderson, Jingwei Zhang, Marius Cornea
{"title":"Enhanced Vector Math Support on the Intel®AVX-512 Architecture","authors":"Cristina S. Anderson, Jingwei Zhang, Marius Cornea","doi":"10.1109/ARITH.2018.8464794","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464794","url":null,"abstract":"The Intel®AVX-512 architecture adds new capabilities such as masked execution, floating-point exception suppression and static rounding modes, as well as a small set of new instructions for mathematical library support. These new features allow for better compliance with floating-point or language standards (e.g. no spurious floating-point exceptions, and faster or more accurate code for directed rounding modes), as well as simpler, smaller footprint implementations that eliminate branches and special case paths. Performance is also improved, in particular for vector mathematical functions (which benefit from easier processing in the main path, and fast access to small lookup tables). In this paper, we describe the relevant new features and their possible applications to floating-point computation. The code examples include a few compact implementation sequences for some common vector mathematical functions.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"1 1","pages":"120-124"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77286435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fast multiplication of binary polynomials with the forthcoming vectorized VPCLMULQDQ instruction 二元多项式的快速乘法与即将到来的矢量化VPCLMULQDQ指令
2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464777
Nir Drucker, S. Gueron, V. Krasnov
{"title":"Fast multiplication of binary polynomials with the forthcoming vectorized VPCLMULQDQ instruction","authors":"Nir Drucker, S. Gueron, V. Krasnov","doi":"10.1109/ARITH.2018.8464777","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464777","url":null,"abstract":"Polynomial multiplication over binary fields $mathbb{F}_{2^{n}}$ is a common primitive, used for example by current cryptosystems such as AES-GCM (with $n=128)$. It also turns out to be a primitive for other cryptosystems, that are being designed for the Post Quantum era, with values $ngg 128$. Examples from the recent submissions to the NIST Post-Quantum Cryptography project, are BIKE, LEDAKem, and GeMSS, where the performance of the polynomial multiplications, is significant. Therefore, efficient polynomial multiplication over $mathbb{F}_{2^{n}}$, with large $n$, is a significant emerging optimization target. Anticipating future applications, Intel has recently announced that its future architecture (codename “Ice Lake”) will introduce a new vectorized way to use the current VPCLMULQDQ instruction. In this paper, we demonstrate how to use this instruction for accelerating polynomial multiplication. Our analysis shows a prediction for at least 2x speedup for multiplications with polynomials of degree 512 or more.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"71 1","pages":"115-119"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90618571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Tunable Floating-Point for Energy Efficient Accelerators 可调浮点节能加速器
2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464797
A. Nannarelli
{"title":"Tunable Floating-Point for Energy Efficient Accelerators","authors":"A. Nannarelli","doi":"10.1109/ARITH.2018.8464797","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464797","url":null,"abstract":"In this work, we address the design of an on-chip accelerator for Machine Learning and other computation-demanding applications with a Tunable Floating-Point (TFP) precision. The precision can be chosen for a single operation by selecting a specific number of bits for significand and exponent in the floating-point representation. By tuning the precision of a given algorithm to the minimum precision achieving an acceptable target error, we can make the computation more power efficient. We focus on floating-point multiplication, which is the most power demanding arithmetic operation.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"6 1","pages":"29-36"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85581806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Radix-64 Floating-Point Divider 基数64浮点除法器
2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464815
J. Bruguera
{"title":"Radix-64 Floating-Point Divider","authors":"J. Bruguera","doi":"10.1109/ARITH.2018.8464815","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464815","url":null,"abstract":"Digit-recurrence division is widely used in actual high-performance microprocessors because it presents a good trade-off in terms of performance, area and power. consumption. In this paper we present a radix-64 divider, providing 6 bits per cycle. To have an affordable implementation, each iteration is composed of three radix-4 iterations; speculation is used between consecutive radix-4 iterations to get a reduced timing. The result is a fast, low-latency floating-point divider, requiring 11, 6, and 4 cycles for double-precision, single-precision and half-precision floating-point division with normalized operands and result. One or two additional cycles are needed in case of subnormal operand(s) or result.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"46 1","pages":"84-91"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88153788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
The Comeback of Reed Solomon Codes 里德·所罗门密码的回归
2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464690
Nir Drucker, S. Gueron, V. Krasnov
{"title":"The Comeback of Reed Solomon Codes","authors":"Nir Drucker, S. Gueron, V. Krasnov","doi":"10.1109/ARITH.2018.8464690","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464690","url":null,"abstract":"Distributed storage systems utilize erasure codes to reduce their storage costs while efficiently handling failures. Many of these codes (e. g., Reed-Solomon (RS) codes) rely on Galois Field (GF) arithmetic, which is considered to be fast when the field characteristic is 2. Nevertheless, some developments in the field of erasure codes offer new efficient techniques that require mostly XOR operations, and are thus faster than GF operations. Recently, Intel announced [1] that its future architecture (codename “Ice Lake”) will introduce new set of instructions called Galois Field New Instruction (GF-NI). These instructions allow software flows to perform vector and matrix multiplications over GF (28) on the wide registers that are available on the AVX512 architectures. In this paper, we explain the functionality of these instructions, and demonstrate their usage for some fast computations in GF(28). We also use the Intel® Intelligent Storage Acceleration Library (ISA-L) in order to estimate potential future improvement for erasure codes that are based on RS codes. Our results predict $approx 1.4mathrm{x}$ speedup for vectorized multiplication, and 1.83x speedup for the actual encoding.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"56 1","pages":"125-129"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82556632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Faster Modular Exponentiation Using Double Precision Floating Point Arithmetic on the GPU 在GPU上使用双精度浮点运算实现更快的模幂运算
2018 IEEE 25th Symposium on Computer Arithmetic (ARITH) Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464792
Niall Emmart, Fangyu Zheng, C. Weems
{"title":"Faster Modular Exponentiation Using Double Precision Floating Point Arithmetic on the GPU","authors":"Niall Emmart, Fangyu Zheng, C. Weems","doi":"10.1109/ARITH.2018.8464792","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464792","url":null,"abstract":"This paper presents a new approach to integer multiple precision (MP) modular exponentiation, using double-precision floating point (DPF) operations, that is suitable for GPU implementation. We show speedups ranging from 20 % to 34 % over the best prior G PU times for sizes corresponding to common RSA cryptographic operations (2048 to 4096 bits). Three techniques are described. First, by adding 2104to the high half of the product, and 252 to the low half, we set the implicit leading 1 in the DPF mantissa so that the full 52 explicit bits are available for each half of the 104-bit products of samples. Second, the DPF values are cast bitwise to 64-bit integers for adding the column sums to get the MP result. Normally the cast would require masking off the exponents, but because they are constant, we can include them in the column sums and correct just once for their total. Third, by initializing the column sums with the appropriate negative value to compensate for the exponent sums, no corrective subtraction is needed. Our implementation on an NVIDIA GTX Titan Black GPU achieves between 132.5K and 161.9K modular exponentiations per second of size 1024 bits, with latencies ranging from 21.7 ms to 17.8 ms, making it practical for online RSA applications. Proportional results are shown for 1536 and 2048 bits. The implementation is so efficient that its maximum sustained performance is actually bounded by the thermal limit of the GPU.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"2013 1","pages":"130-137"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82608726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信