{"title":"Specifications for a variable-precision arithmetic coprocessor","authors":"T. E. Hull, M. S. Cohen, C. Hall","doi":"10.1109/ARITH.1991.145548","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145548","url":null,"abstract":"The authors have been developing a programming system intended to be especially convenient for scientific computing. Its main features are variable precision (decimal) floating-point arithmetic and convenient exception handling. The software implementation of the system has evolved over a number of years, and a partial hardware implementation of the arithmetic itself was constructed and used during the early stages of the project. Based on this experience, the authors have developed a set of specifications for an arithmetic coprocessor to support such a system. These specifications are described. An outline of the language features and how they can be used is also provided, to help justify the particular choice of coprocessor specifications. The authors also indicate what other hardware features would be most helpful to the systems programmer, especially for implementation of the exception handling.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116795339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast division using accurate quotient approximations to reduce the number of iterations","authors":"D. Wong, M. Flynn","doi":"10.1109/ARITH.1991.145559","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145559","url":null,"abstract":"A class of iterative integer division algorithms is presented based on lookup table Taylor-series approximations to the reciprocal. The algorithm iterates by using the reciprocal to find an approximate quotient and then subtracting the quotient multiplied by the divisor from the dividend to find a remaining dividend. Fast implementations can produce an average of either 14 or 27 b per iteration, depending on whether the basic or advanced version of this method is implemented. Detailed analyses are presented to support the claimed accuracy per iteration. Speed estimates using state-of-the-art ECL (emitted coupled logic) components show that this method is faster than the Newton-Raphson technique and can produce 53-b quotients of 53-b numbers in about 28 or 22 ns for the basic and advanced versions.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130779484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of on-line arithmetic algorithms to the SVD computation: preliminary results","authors":"P. Tu, M. Ercegovac","doi":"10.1109/ARITH.1991.145568","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145568","url":null,"abstract":"A scheme for the singular value decomposition (SVD) problem, based on online arithmetic, is discussed. The design, using radix-2 floating-point online operations, implemented in the LSI HCMOS gate-array technology, is compared with a compatible conventional arithmetic implementation. The preliminary results indicate that the proposed online approach achieves a speedup of 2.4-3.2 with respect to the conventional solutions, with 1.3-5.5 more gates and more than 6 times fewer interconnections.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125992834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. K. Chan, M. Schlag, C. Thomborson, V. Oklobdzija
{"title":"Delay optimization of carry-skip adders and block carry-lookahead adders","authors":"P. K. Chan, M. Schlag, C. Thomborson, V. Oklobdzija","doi":"10.1109/ARITH.1991.145552","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145552","url":null,"abstract":"The worst-case carry propagation delays in carry-skip adders and block carry-lookahead adders depend on how the full adders are grouped structurally together into blocks as well as the number of levels. The authors report a multidimensional dynamic programming paradigm for configuring these two adders to attain minimum latency. Previous methods are applicable only to very limited delay models that do not guarantee a minimum latency configuration. Under the proposed delay model, critical path delay is calculated taking into account not only the intrinsic gate delays but also the fanin and fanout contributions.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128955682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and implementation of a floating-point quasi-systolic general purpose CORDIC rotator for high-rate parallel data and signal processing","authors":"A. D. Lange, E. Deprettere","doi":"10.1109/ARITH.1991.145571","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145571","url":null,"abstract":"The authors describe the design and implementation of an algorithm and a processor which can be used to accelerate computations in which large amounts of rotations (circular as well as hyperbolic) are involved. The processor is a low-cost high-throughput VLSI implementation of the algorithm. With 10/sup 7/ rotations per second, many real-time and interaction-time applications in scientific computation become feasible. The required storage and/or silicon area is low and the execution time is independent of the particular operation performed. Another feature of this CORDIC design is its pipelined architecture and floating point extension. It is angle-pipelinable at the bit-level and has an execution time which is independent of any possible operation that can be executed.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124986731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate and monotone approximations of some transcendental functions","authors":"W. Ferguson, T. Brightman","doi":"10.1109/ARITH.1991.145566","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145566","url":null,"abstract":"A technique for computing monotonicity preserving approximations F/sub a/(x) of a function F(x) is presented. This technique involves computing an extra precise approximation of F(x) that is rounded to produce the value of F/sub a/(x). For example, only a few extra bits of precision are used to make the accurate transcendental functions found on the Cyrix FasMath line of 80387 compatible math coprocessors monotonic.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"252 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133407189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A redundant binary Euclidean GCD algorithm","authors":"S. N. Parikh, D. Matula","doi":"10.1109/ARITH.1991.145563","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145563","url":null,"abstract":"An efficient implementation of the Euclidean GCD (greatest common divisor) algorithm employing the redundant binary number system is described. The time complexity is O(n), utilizing O(n)4-2 signed 1-b adders to determine the GCD of two n-b integers. The process is similar to that used in SRT division. The efficiency of the algorithm is competitive, to within a small factor, with floating point division in terms of the number of shift and add/subtract operations. The novelty of the algorithm is based on properties derived from the proposed scheme of normalization of signed bit fractions. The implementation is well suited for systolic hardware design.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114067360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OCAPI: architecture of a VLSI coprocessor for the GCD and the extended GCD of large numbers","authors":"A. Guyot","doi":"10.1109/ARITH.1991.145564","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145564","url":null,"abstract":"Various algorithms for finding the greatest common divisor (GCD) and extended GCD of very large integers are explored. In particular, the tradeoff between computation time and area is examined. Two of the algorithms, from which the method for deriving variants is straightforward, are detailed. Then the architecture of a VLSI processor dedicated to GCD as well as multiply, divide, square root, etc. of very large numbers (>600 decimal digits), using an internal radix 2 redundant representation and supporting multiple precision, is described.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114689215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Representation of numbers in nonclassical numeration systems","authors":"Christiane Frougny","doi":"10.1109/ARITH.1991.145528","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145528","url":null,"abstract":"Numeration systems, the bases of which are defined by a linear recurrence with integer coefficients, are considered. Conditions on the recurrence are given under which the function of normalization which transforms any representation of an integer into the normal one-obtained by the usual algorithm-can be realized by a finite automaton. Addition is a particular case of normalization. The same questions are discussed for the representation of real numbers in basis theta , where theta is a real number >1. In particular it is shown that, if theta is a Pisot number, then the normalization and the addition in basis theta are computable by a finite automaton.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"77 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134411195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 160 ns 54 bit CMOS division implementation using self-timing and symmetrically overlapped SRT stages","authors":"T. Williams, M. Horowitz","doi":"10.1109/ARITH.1991.145561","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145561","url":null,"abstract":"A full-custom VLSI chip demonstrates an arithmetic implementation for computing the mantissa of a 54-b (floating-point double-precision) division operation in 45 ns to 160 ns, depending on the data. The design uses self-timing to avoid the need to partition logic into clock cycles and the need for high-speed clocks. Self-timing allows the circuits to iterate with no overhead over the pure combinational logic delays. It also allows a greater-efficiency symmetric overlapped execution of the SRT stages because of dynamic path ordering. The design has several other performance enhancements, and their effects on the performance are discussed. The total effect of all the performance enhancements provides a factor of two increase in performance due to architectural improvements over a straightforward SRT approach.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123093880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}