A fast carry-free algorithm and hardware design for extended integer GCD computation

Symposium on Symbolic and Algebraic Manipulation Pub Date : 1986-10-01 DOI:10.1145/32439.32455

D. Yun, Chang Nian Zhang

{"title":"A fast carry-free algorithm and hardware design for extended integer GCD computation","authors":"D. Yun, Chang Nian Zhang","doi":"10.1145/32439.32455","DOIUrl":null,"url":null,"abstract":"I$ is well known that finding the greatest common divisor (GCD) of two integers is one of the fundamental computations in exact rational arithmetic, factorization and cryptography. Euclid3 algorithm and its variants are the widely used for GCD computations [Knu 811. However they are not suitable in the parallel computation. Since the whole-word comparisons is required. G.B. Purdy /Pur 831 proposed a different way to compute GCD which requires no comparison. The advantage of the Purdy’s algorithm is provided a possible way to speed up the period of each iteration time by using carry save technique. However, it requires 0( n2) iterations in its worst case where n denotes the number of bits of two inputs. In addition, it requires the additional hardware support to handle the overflow problem. R. P. Brent and H. T. Kung [B&K 85) have developed a plus-minus (PM) algorithm that test only the two least significant bits of two integers. The advantage of the PM algorithm is that the number of the iteration is at most 3.012*n units. In particular, this gives a linear time implementation on a systolic array [B&K 851. Although reaaonabl efficient in its use of silicon area, the delay between first input and first output of a computation for the serial-in-serial-out GCD is great than 3 n time units which may be undesirable long depending on the application. The basic idea in our algorithm is to combine two sequence operations of PM algorithm of BrenbKung into one basic operation, and also to avoid swap operations during the iterations to achieve higher parallelism. It has been proved that for any two n bit integers, the number of iterations of the new algorithm is less than 1.51*n+ 1 time units. A preliminary hardware design shows that the algorithm can be implemented in a simple way which consists of several conventional computer components such ss shift registers, borrow save adder, counter and a small PLA as controller. The algorithm can be extended to find not only the greatest common divisor of two numbers A and B, but also to find a pair of integers (2, y) such that AZ + By =GCD(A,B) with the same time complexity. A scheme to cascade a number of such GCD chips to compute very large GCD’s is also at hand, which alleviates a critical difficulty in such fields as cryptography.","PeriodicalId":314618,"journal":{"name":"Symposium on Symbolic and Algebraic Manipulation","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1986-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symposium on Symbolic and Algebraic Manipulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/32439.32455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

I$ is well known that finding the greatest common divisor (GCD) of two integers is one of the fundamental computations in exact rational arithmetic, factorization and cryptography. Euclid3 algorithm and its variants are the widely used for GCD computations [Knu 811. However they are not suitable in the parallel computation. Since the whole-word comparisons is required. G.B. Purdy /Pur 831 proposed a different way to compute GCD which requires no comparison. The advantage of the Purdy’s algorithm is provided a possible way to speed up the period of each iteration time by using carry save technique. However, it requires 0( n2) iterations in its worst case where n denotes the number of bits of two inputs. In addition, it requires the additional hardware support to handle the overflow problem. R. P. Brent and H. T. Kung [B&K 85) have developed a plus-minus (PM) algorithm that test only the two least significant bits of two integers. The advantage of the PM algorithm is that the number of the iteration is at most 3.012*n units. In particular, this gives a linear time implementation on a systolic array [B&K 851. Although reaaonabl efficient in its use of silicon area, the delay between first input and first output of a computation for the serial-in-serial-out GCD is great than 3 n time units which may be undesirable long depending on the application. The basic idea in our algorithm is to combine two sequence operations of PM algorithm of BrenbKung into one basic operation, and also to avoid swap operations during the iterations to achieve higher parallelism. It has been proved that for any two n bit integers, the number of iterations of the new algorithm is less than 1.51*n+ 1 time units. A preliminary hardware design shows that the algorithm can be implemented in a simple way which consists of several conventional computer components such ss shift registers, borrow save adder, counter and a small PLA as controller. The algorithm can be extended to find not only the greatest common divisor of two numbers A and B, but also to find a pair of integers (2, y) such that AZ + By =GCD(A,B) with the same time complexity. A scheme to cascade a number of such GCD chips to compute very large GCD’s is also at hand, which alleviates a critical difficulty in such fields as cryptography.

查看原文本刊更多论文

扩展整数GCD计算的快速无携带算法及硬件设计

众所周知，寻找两个整数的最大公约数(GCD)是精确有理数算术、因式分解和密码学中的基本计算之一。Euclid3算法及其变体被广泛用于GCD计算[Knu 811]。然而，它们并不适用于并行计算。因为需要对整个单词进行比较。G.B. Purdy /Pur 831提出了一种不需要比较的计算GCD的不同方法。Purdy算法的优点是利用进位保存技术为加快每次迭代周期提供了可能的途径。然而，在最坏的情况下，它需要0(n2)次迭代，其中n表示两个输入的位数。此外，它还需要额外的硬件支持来处理溢出问题。R. P. Brent和H. T. Kung [B&K 85]开发了一种加减(PM)算法，该算法仅测试两个整数的两个最低有效位。PM算法的优点是迭代次数最多为3.012*n个单元。特别是，这给出了一个在收缩数组[B&K 851]上的线性时间实现。虽然在使用硅面积方面是合理有效的，但是串行输入串行输出GCD计算的第一次输入和第一次输出之间的延迟大于3n个时间单位，这可能是不希望的长，具体取决于应用。我们算法的基本思想是将BrenbKung的PM算法的两个序列操作合并为一个基本操作，并且在迭代过程中避免交换操作，以实现更高的并行性。证明了对于任意两个n位整数，新算法的迭代次数小于1.51*n+ 1时间单位。初步的硬件设计表明，该算法可以以一种简单的方式实现，它由几个传统的计算机组件组成，如移位寄存器、借用保存加法器、计数器和一个小型PLA作为控制器。该算法不仅可以推广到求两个数A和B的最大公约数，而且可以求出具有相同时间复杂度的AZ + By =GCD(A,B)的一对整数(2,y)。一种将许多这样的GCD芯片级联以计算非常大的GCD的方案也在手边，这缓解了密码学等领域的关键困难。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Symposium on Symbolic and Algebraic Manipulation

自引率

0.00%

发文量