2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)最新文献

A CRC-Based Concurrent Fault Detection Architecture for Galois/Counter Mode (GCM) 基于crc的GCM并发故障检测体系

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.19

Amir Ali Kouzeh Geran, A. Reyhani-Masoleh

引用次数: 3

Hybrid Position-Residues Number System 混合位置-残数系统

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.15

Karim Bigou, A. Tisserand

引用次数: 12

A New Multiplication Algorithm for Extended Precision Using Floating-Point Expansions 一种利用浮点展开扩展精度的新乘法算法

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.18

J. Muller, Valentina Popescu, P. T. P. Tang

{"title":"A New Multiplication Algorithm for Extended Precision Using Floating-Point Expansions","authors":"J. Muller, Valentina Popescu, P. T. P. Tang","doi":"10.1109/ARITH.2016.18","DOIUrl":"https://doi.org/10.1109/ARITH.2016.18","url":null,"abstract":"Some important computational problems must use a floating-point (FP) precision several times higher than the hardware-implemented available one. These computations critically rely on software libraries for high-precision FP arithmetic. The representation of a high-precision data type crucially influences the corresponding arithmetic algorithms. Recent work showed that algorithms for FP expansions, that is, a representation based on unevaluated sum of standard FP types, benefit from various high-performance support for native FP, such as low latency, high throughput, vectorization, threading, etc. Bailey's QD library and its corresponding Graphics Processing Unit (GPU) version, GQD, are such examples. Despite using native FP arithmetic as the key operations, QD and GQD algorithms are focused on double-double or quad-double representations and do not generalize efficiently or naturally to a flexible number of components in the FP expansion. In this paper, we introduce a new multiplication algorithm for FP expansion with flexible precision, up to the order of tens of FP elements in mind. The main feature consists in the partial products being accumulated in a special designed data structure that has the regularity of a fixed-point representation while allowing the computation to be naturally carried out using native FP types. This allows us to easily avoid unnecessary computation and to present rigorous accuracy analysis transparently. The algorithm, its correctness and accuracy proofs and some performance comparisons with existing libraries are all contributions of this paper.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133919053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Optimizing Modular Multiplication for NVIDIA's Maxwell GPUs 优化NVIDIA的Maxwell gpu的模块化乘法

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.21

Niall Emmart, J. Luitjens, C. Weems, Cliff Woolley

{"title":"Optimizing Modular Multiplication for NVIDIA's Maxwell GPUs","authors":"Niall Emmart, J. Luitjens, C. Weems, Cliff Woolley","doi":"10.1109/ARITH.2016.21","DOIUrl":"https://doi.org/10.1109/ARITH.2016.21","url":null,"abstract":"In this paper we show how we were able to achieve record rates of multiple precision (MP) modular multiplication (mulmod) operations in the new NVIDIA MP math library (XMP) on Maxwell, NVIDIA's most recent generation of graphics processing units (GPUs). Mulmod is a key operation that is used in multiple places within the MP library, and has many real world applications, especially in cryptography, which makes it important to achieve a highly optimized implementation. Here we reveal how multiple techniques were combined to make the best use of the GPU'sinstructions, registers, memory, and threads. A particularly interesting algorithmic aspect, designed to work with the 16-bit hardware multipliers found in Maxwell, is the use of a two-pass process to first compute unaligned partial products, then shift the result 16 bits to the left, then compute the aligned partial products. The new algorithms are much faster than the prior, state of the art, row-oriented multiply and reduce approach, achieving speedups of 61% at 256 bits, and 117% at 512 bits, with peaks rates of 4027 million mulmod operations at 256 bits and 1081 million at 512 bits on a GTX 980Ti.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129068939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Random Digit Representation of Integers 整数的随机数字表示

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.11

N. Méloni, M. A. Hasan

引用次数: 6

Hardware Implementation of AES Using Area-Optimal Polynomials for Composite-Field Representation GF(2^4)^2 of GF(2^8) 复合域表示GF(2^8)的GF(2^4)^2的面积最优多项式AES硬件实现

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.32

S. Gueron, S. Mathew

{"title":"Hardware Implementation of AES Using Area-Optimal Polynomials for Composite-Field Representation GF(2^4)^2 of GF(2^8)","authors":"S. Gueron, S. Mathew","doi":"10.1109/ARITH.2016.32","DOIUrl":"https://doi.org/10.1109/ARITH.2016.32","url":null,"abstract":"This paper discusses the question of optimizing AES hardware designs, by using the composite field representation GF(24)2 of the field GF(28), that underlies the definition of AES. Here, GF(24)2 is the field extension of the ground field GF(24) with an extension polynomial of the form x2 + αx + β, where a and β are elements of field GF(24). Previous designs with such representations used α = 1, which seemingly leads to some obvious savings. By contrast, we seek the optimal designs among all the possibilities. Our designs are based on mapping the input, output, round keys, and the AES operations to and from any one of the 2880 possible representations of GF(28) as (24)2. For each representation, we also explore three options for the affine/invaffine constants, resulting in a total of 8640 possible designs. We identify the smallest area representations for AES encryption-only, decryption-only, and for unified encryptiondecryption. Surprisingly, the optimal representations in each case are different from each other. In addition, we identify six distinct representations that are optimal, based on operating-mode and AES pipeline depth. Among other results, we show here a set of high-bandwidth 16-byte AES datapaths with the extension polynomials of the form x2 + αx + β where α ≠ 1, showing that the a-priori obvious choice of using α = 1, does not necessarily lead to the best result. We provide the full details of all the designs possibilities, together with their respective area, based on 22nm CMOS implementation.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133971509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Efficient Combinational Circuits for Division by Small Integer Constants 小整数常数除法的高效组合电路

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.23

H. F. Ugurdag, A. Bayram, Vecdi Emre Levent, Sezer Gören

引用次数: 2

Recovering Numerical Reproducibility in Hydrodynamic Simulations 恢复水动力模拟的数值再现性

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.27

P. Langlois, R. Nheili, C. Denis

引用次数: 5

Accelerating Big Integer Arithmetic Using Intel IFMA Extensions 使用英特尔IFMA扩展加速大整数运算

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.22

S. Gueron, V. Krasnov

引用次数: 14

A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal (BCD) Codes 使用混合二进制编码十进制(BCD)码的并行十进制乘法器

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.8

Xiaoping Cui, Weiqiang Liu, Dong Wenwen, F. Lombardi

引用次数: 9