2011 IEEE 20th Symposium on Computer Arithmetic最新文献

The Arithmetic Operators You Will Never See in a Microprocessor 你永远不会在微处理器上看到的算术运算符

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.33

F. D. Dinechin

引用次数: 6

Radix-16 Combined Division and Square Root Unit 基数-16联合除法和平方根单位

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.30

A. Nannarelli

引用次数: 13

Towards a Quaternion Complex Logarithmic Number System 四元数复对数系统

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.14

M. Arnold, J. Cowles, Vassilis Paliouras, I. Kouretas

{"title":"Towards a Quaternion Complex Logarithmic Number System","authors":"M. Arnold, J. Cowles, Vassilis Paliouras, I. Kouretas","doi":"10.1109/ARITH.2011.14","DOIUrl":"https://doi.org/10.1109/ARITH.2011.14","url":null,"abstract":"The well-known generalization of real to complex arithmetic (two reals) extends further to more obscure quaternion arithmetic (four reals), which has applications in signal processing, aerospace, graphics and virtual reality. Quaternion multiplication implements 3D rotation, but is expensive (usually 16 floating-point multiplications and 12 additions). This paper proposes an alternative quaternion representation using logarithms to reduce multiplication cost. The real Logarithmic Number System (LNS) allows fast and inexpensive multiplication and division in embedded and FPGA-based systems. Recent advances in the Complex LNS (CLNS) [5] have made fast log-polar complex representation affordable. Although the quaternion logarithm function is also well-defined, it is not useful to simplify multiplication (in the same way real and complex logarithms are) because quaternion multiplication is not commutative but quaternion addition is. To overcome this, we propose a novel Quaternion Complex (QCLNS) representation using a pair of CLNS numbers. This representation implements quaternion multiplication using only the theoretical minimum [11], [15] of 8 LNS multipliers (i.e., fixed-point adders) and two CLNS adders. Because CLNS numbers are more compact than ordinary rectangular complex representation, single-precision QCLNS occupies 10.9 percent less memory than conventional quaternion representation. Extrapolating conventional LNS and floating-point synthesis data from Fu et al. [12], QCLNS saves on average 10 percent of FPGA resources for precisions between 13 and 45 bits.","PeriodicalId":272151,"journal":{"name":"2011 IEEE 20th Symposium on Computer Arithmetic","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125741370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Automatic Generation of Fast and Certified Code for Polynomial Evaluation 多项式求值的快速认证代码自动生成

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.39

C. Mouilleron, G. Revy

引用次数: 27

Short Division of Long Integers 长整数的短除法

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.11

David Harvey, P. Zimmermann

引用次数: 2

Radix-8 Digit-by-Rounding: Achieving High-Performance Reciprocals, Square Roots, and Reciprocal Square Roots 基数-8位四舍五入:实现高性能的倒数、平方根和倒数平方根

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.28

J. A. Butts, P. T. P. Tang, R. Dror, D. Shaw

{"title":"Radix-8 Digit-by-Rounding: Achieving High-Performance Reciprocals, Square Roots, and Reciprocal Square Roots","authors":"J. A. Butts, P. T. P. Tang, R. Dror, D. Shaw","doi":"10.1109/ARITH.2011.28","DOIUrl":"https://doi.org/10.1109/ARITH.2011.28","url":null,"abstract":"We describe a high-performance digit-recurrence algorithm for computing exactly rounded reciprocals, square roots, and reciprocal square roots in hardware at a rate of three result bits -- one radix-8 digit -- per recurrence iteration. To achieve a single-cycle recurrence at a short cycle time, we adapted the digit-by-rounding algorithm, which is normally applied at much higher radices, for efficient operation at radix 8. Using this approach avoids in the recurrence step the lookup table required by SRT -- the usual algorithm used for hardware digit recurrences. The increasing access latency of this table, the size of which grows super linearly in the radix, limits high-frequency SRT implementations to radix 4 or lower. We also developed a series of novel optimizations focused on further reducing the critical path through the recurrence. We propose, for example, decreasing data path widths to a point where erroneous results sometimes occur and then correcting these errors off the critical path. We present a specific implementation that computes any of these functions to 31 bits of precision in 13 cycles. Our implementation achieves a cycle time only 11% longer than the best reported SRT design for the same functions, yet delivers results in five fewer cycles. Finally, we show that even at lower radices, a digit-by-rounding design is likely to have a shorter critical path than one using SRT at the same radix.","PeriodicalId":272151,"journal":{"name":"2011 IEEE 20th Symposium on Computer Arithmetic","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122996157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Composite Iterative Algorithm and Architecture for q-th Root Calculation q次方根计算的复合迭代算法与体系结构

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.16

Álvaro Vázquez, J. Bruguera

引用次数: 12

A 1.5 Ghz VLIW DSP CPU with Integrated Floating Point and Fixed Point Instructions in 40 nm CMOS 在40纳米CMOS中集成浮点和定点指令的1.5 Ghz VLIW DSP CPU

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.20

T. Anderson, Duc Bui, S. Moharil, Soujanya Narnur, Mujibur Rahman, A. Lell, Eric Biscondi, A. Shrivastava, P. Dent, Mingjian Yan, Hasan Mahmood

引用次数: 9

High Degree Toom'n'Half for Balanced and Unbalanced Multiplication 平衡和不平衡乘法的高阶二分之一

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.12

Marco Bodrato

引用次数: 4

Flocq: A Unified Library for Proving Floating-Point Algorithms in Coq 在Coq中证明浮点算法的统一库

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.40

S. Boldo, G. Melquiond

引用次数: 134