Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)最新文献

筛选
英文 中文
On the design of high-radix on-line division for long precision 长精度高基数在线除法的设计
Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336) Pub Date : 1999-04-14 DOI: 10.1109/ARITH.1999.762827
A. Tenca, M. Ercegovac
{"title":"On the design of high-radix on-line division for long precision","authors":"A. Tenca, M. Ercegovac","doi":"10.1109/ARITH.1999.762827","DOIUrl":"https://doi.org/10.1109/ARITH.1999.762827","url":null,"abstract":"We present a design of a high-radix on-line division suitable for long precision computations. The proposed scheme uses a quotient-digit selection function based on the residual rounding and scaling of the operands. The bounds on the number of cycles and the cycle time for radix 2/sup k/ and n-bit precision are obtained in terms of full-adder delays. The speedup with respect to radix 2 is greater than 3.3 for k/spl ges/6 and n/spl ges/64. The cost increases as a function of the radix. For the case r=64 and n=64, the increase in area with respect to r=2 is about 6.6 times plus a 512/spl times/10-bit table. The proposed scheme has been designed and verified using VHDL and a 1.2 /spl mu/m CMOS standard gate technology from MOSIS library.","PeriodicalId":434169,"journal":{"name":"Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124475714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A 32 bit logarithmic arithmetic unit and its performance compared to floating-point 一种32位对数运算单元及其与浮点数的性能比较
Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336) Pub Date : 1999-04-14 DOI: 10.1109/ARITH.1999.762839
J. N. Coleman, E. Chester
{"title":"A 32 bit logarithmic arithmetic unit and its performance compared to floating-point","authors":"J. N. Coleman, E. Chester","doi":"10.1109/ARITH.1999.762839","DOIUrl":"https://doi.org/10.1109/ARITH.1999.762839","url":null,"abstract":"As an alternative to floating-point, several papers have proposed the use of a logarithmic number system, in which a real number is represented as a fixed-point logarithm. Multiplication and division therefore proceed in minimal time with no rounding error. However, the system can only offer an overall advantage if addition and subtraction can be performed with speed and accuracy at least equal to that of floating-paint, but these operations require the interpolation of a non-linear function which has hitherto been either time-consuming or inaccurate. We present a procedure by which additions and subtractions can be performed rapidly and accurately, and show that these operations are thereby competitive with their floating-point equivalents. We then show that the average performance of the logarithmic system exceeds floating-point, in terms of both speed and accuracy.","PeriodicalId":434169,"journal":{"name":"Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115830062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Floating-point unit in standard cell design with 116 bit wide dataflow 标准单元设计中的浮点单元,具有116位宽的数据流
Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336) Pub Date : 1999-04-14 DOI: 10.1109/ARITH.1999.762853
Guenter Gerwig, M. Kroener
{"title":"Floating-point unit in standard cell design with 116 bit wide dataflow","authors":"Guenter Gerwig, M. Kroener","doi":"10.1109/ARITH.1999.762853","DOIUrl":"https://doi.org/10.1109/ARITH.1999.762853","url":null,"abstract":"The floating point unit of a S/390 CMOS microprocessor is described. It contains a 116 bit fraction data flow for addition and subtraction and a 64 bit-wide multiplier. Besides the register array, there are no other dataflow macros used; it is fully designed with standard cell books and is placed flat with a timing driven placement algorithm. This design method allows more 'irregular' structures than usually found in custom designs. An overview of the floating point unit is given and some interesting design items are shown: a 120 bit-wide true-complement adder with precounting of leading zero digits, a signed multiplier with bit-optimized Wallace tree, intensive forwarding in source equal target cases and the checking method.","PeriodicalId":434169,"journal":{"name":"Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122517811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Low-power division: comparison among implementations of radix 4, 8 and 16 低功耗除法:基数4,8和16的实现比较
Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336) Pub Date : 1999-04-14 DOI: 10.1109/ARITH.1999.762829
A. Nannarelli, T. Lang
{"title":"Low-power division: comparison among implementations of radix 4, 8 and 16","authors":"A. Nannarelli, T. Lang","doi":"10.1109/ARITH.1999.762829","DOIUrl":"https://doi.org/10.1109/ARITH.1999.762829","url":null,"abstract":"Although division is less frequent than addition and multiplication, because of its longer latency it dissipates a substantial part of the energy in floating-point units. In this paper we explore the relation between the radix and the energy dissipated. Previous work has been done an radix-4 and radix-8 division. Here we extend this study to a radix-4 scheme with two overlapped radix-4 stages and compare the latency, area, and energy of the three implementations. Results show that by applying the low-power techniques the energy dissipation is reduced from 30% to 40%, with respect to the standard implementation. An additional 20% reduction can be obtained using a dual voltage. Moreover the energy dissipated to complete the division is roughly the same for the three radices. However, the power dissipation, proportional to the average current, increases with the radix. If reducing the energy is the priority, for the same latency radix-16 with dual voltage produces the smallest energy dissipation.","PeriodicalId":434169,"journal":{"name":"Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125565612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
VLSI costs of arithmetic parallelism: a residue reverse conversion perspective VLSI的算术并行开销:一个残数反向转换的观点
Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336) Pub Date : 1999-04-14 DOI: 10.1109/ARITH.1999.762843
M. Bhardwaj, T. Srikanthan, C. Clarke
{"title":"VLSI costs of arithmetic parallelism: a residue reverse conversion perspective","authors":"M. Bhardwaj, T. Srikanthan, C. Clarke","doi":"10.1109/ARITH.1999.762843","DOIUrl":"https://doi.org/10.1109/ARITH.1999.762843","url":null,"abstract":"This paper reports how VLSI cost metrics (area, delay, power) of residue reverse converters scale with the cardinality and dynamic range of moduli sets. The study uses CMAC reverse converters, reported previously by the authors to be the most efficient known to date in terms of area and delay. In all, 134 reverse converters with dynamic ranges from 32 to 120 bits and set cardinalities ranging from 4 to 20 are actually constructed and analyzed. It is seen that area, delay and power costs are cardinality insensitive once the cardinality exceeds a threshold (usually between five to eight). For cardinalities beyond this threshold, conversion costs are essentially dynamic range dependent. This insensitivity is explained in detail by noting the counterbalancing effects of the various sub-units of a CMAC reverse converter. Since practical implementations of RNS usually employ cardinalities beyond the abovementioned thresholds, the significance of this study is its conclusion that increasing the set cardinality in most implementations will have a marginal, if any, effect on VLSI reverse conversion costs.","PeriodicalId":434169,"journal":{"name":"Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134183548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Area/spl times/delay (A/spl middot/T) efficient multiplier based on an intermediate hybrid signed-digit (HSD-1) representation 基于中间混合符号数字(HSD-1)表示的面积/spl时间/延迟(A/spl中点/T)高效乘法器
Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336) Pub Date : 1999-04-14 DOI: 10.1109/ARITH.1999.762847
Jeng-Jong J. Lue, D. Phatak
{"title":"Area/spl times/delay (A/spl middot/T) efficient multiplier based on an intermediate hybrid signed-digit (HSD-1) representation","authors":"Jeng-Jong J. Lue, D. Phatak","doi":"10.1109/ARITH.1999.762847","DOIUrl":"https://doi.org/10.1109/ARITH.1999.762847","url":null,"abstract":"Intermediate Signed Digit (SD) representation can facilitate fast and compact VLSI implementations of partial product accumulation trees. It achieves a reduction ratio of 2:1 at every level and also leads to more regular layouts. Its disadvantage is that the number of bit lines that need to be routed can be high. This can lead to a significant area overhead especially at smaller feature sizes where the wire/interconnect area and delay can be dominant. A Hybrid Signed Digit (HSD) representation lets some of the digits be unsigned bits, thereby reducing the number of bit lines. By arbitrarily varying the positions of and distances between consecutive signed digits, this representation can trade off latency for area and offers a continuum of choices between the two's complement representation on the one hand and fully Signed Digit (FSD or simply SD) representation on the other. We illustrate an A/spl middot/T (area/spl times/delay) efficient multiplier based on the HSD-1 representation which is one of the many possible HSD formats, wherein every alternate digit is signed and the rest are unsigned (ordinary) bits. It is seen that multipliers based on HSD-1 format require more transistors than those based on FSD format. However, they require fewer bit lines to be routed, which substantially reduces the interconnect area; thereby leading to a reduction in the total VLSI area and a lower A/spl middot/T product. The design reaffirms that the interconnect area can be significant, especially at small feature sizes.","PeriodicalId":434169,"journal":{"name":"Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132390589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Intermediate variable encodings that enable multiplexor-based implementations of two operand addition 中间变量编码,支持基于多路器的两个操作数加法实现
Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336) Pub Date : 1999-04-14 DOI: 10.1109/ARITH.1999.762824
D. Phatak, I. Koren
{"title":"Intermediate variable encodings that enable multiplexor-based implementations of two operand addition","authors":"D. Phatak, I. Koren","doi":"10.1109/ARITH.1999.762824","DOIUrl":"https://doi.org/10.1109/ARITH.1999.762824","url":null,"abstract":"In two operand addition, bit-wise intermediate variables such as the \"propagate\" and \"generate\" terms are defined/evaluated first. Basic carry propagation recursion is then expressed in terms of these variables and is \"unrolled\" to obtain a tree structure for fast execution. In CMOS VLSI technology, multiplexors are fast and efficient to implement. Hence, we investigate in this paper all possible two-bit encodings for the intermediate variables and identify the ones that enable multiplexor-based implementations. Some of these encodings enable further simplification of the multiplexor-based realizations. Our analysis also shows that adopting an intermediate signed-digit representation simply amounts to selecting one of the possible encodings. Thus, there is no inherent advantage to the use of intermediate signed-digit representations in a two operand addition. Finally, we extend our analysis to the generalized look-ahead-recursions proposed by R.W. Doran (1988).","PeriodicalId":434169,"journal":{"name":"Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122342696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Reduced latency IEEE floating-point standard adder architectures 降低延迟IEEE浮点标准加法器架构
Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336) Pub Date : 1999-04-14 DOI: 10.1109/ARITH.1999.762826
A. Beaumont-Smith, N. Burgess, S. Lefrere, C. Lim
{"title":"Reduced latency IEEE floating-point standard adder architectures","authors":"A. Beaumont-Smith, N. Burgess, S. Lefrere, C. Lim","doi":"10.1109/ARITH.1999.762826","DOIUrl":"https://doi.org/10.1109/ARITH.1999.762826","url":null,"abstract":"The design and implementation of a double precision floating-point IEEE-754 standard adder is described which uses \"flagged prefix addition\" to merge rounding with the significand addition. The floating-point adder is implemented in 0.5 /spl mu/m CMOS, measures 1.8 mm/sup 2/, has a 3-cycle latency and implements all rounding modes. A modified version of this floating-point adder can perform accumulation in 2-cycles with a small amount of extra hardware for use in a parallel processor node. This is achieved by feeding back the previous un-normalised but correctly rounded result together with the normalisation distance. A 2-cycle latency floating-point adder architecture with potentially the same cycle time that also employs flagged prefix addition is described. It also incorporates a fast prediction scheme for the true subtraction of significands with an exponent difference of 1, with one less adder.","PeriodicalId":434169,"journal":{"name":"Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126061674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Montgomery modular exponentiation on reconfigurable hardware 可重构硬件上的Montgomery模幂运算
Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336) Pub Date : 1999-04-14 DOI: 10.1109/ARITH.1999.762831
Thomas Blum
{"title":"Montgomery modular exponentiation on reconfigurable hardware","authors":"Thomas Blum","doi":"10.1109/ARITH.1999.762831","DOIUrl":"https://doi.org/10.1109/ARITH.1999.762831","url":null,"abstract":"It is widely recognized that security issues will play a crucial role in the majority of future computer and communication systems. Central tools for achieving system security are cryptographic algorithms. For performance as well as for physical security reasons, it is often advantageous to realize cryptographic algorithms in hardware. In order to overcome the well-known drawback of reduced flexibility that is associated with traditional ASIC solutions, this contribution proposes arithmetic architectures which are optimized for modern field programmable gate arrays (FPGAs). The proposed architectures perform modular exponentiation with very long integers. This operation is at the heart of many practical public-key algorithms such as RSA and discrete logarithm schemes. We combine the Montgomery modular multiplication algorithm with a new systolic array design, which is capable of processing a variable number of bits per array cell. The designs are flexible, allowing any choice of operand and modulus. Unlike previous approaches, we systematically implement and compare several variants of our new architecture for different bit lengths. We provide absolute area and timing measures for each architecture. The results allow conclusions about the feasibility and time-space trade-offs of our architecture for implementation on Xilinx XC4000 series FPGAs. As a major practical result we show that it is possible to implement modular exponentiation at secure bit lengths on a single commercially available FPGA.","PeriodicalId":434169,"journal":{"name":"Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)","volume":"37 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125733835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 206
Complex logarithmic number system arithmetic using high-radix redundant CORDIC algorithms 使用高基数冗余CORDIC算法的复对数系统算术
Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336) Pub Date : 1999-04-14 DOI: 10.1109/ARITH.1999.762845
D. Lewis
{"title":"Complex logarithmic number system arithmetic using high-radix redundant CORDIC algorithms","authors":"D. Lewis","doi":"10.1109/ARITH.1999.762845","DOIUrl":"https://doi.org/10.1109/ARITH.1999.762845","url":null,"abstract":"This paper describes the application of high radix redundant CORDIC algorithms to complex logarithmic number system arithmetic. It shows that a CLNS addition can be performed with approximately the same hardware as a high-radix CORDIC operation. A design example comparable to single precision floating point has been designed and simulated.","PeriodicalId":434169,"journal":{"name":"Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132322326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信