Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003最新文献_第3页

Hardware synthesis for multi-dimensional time 多维时间硬件合成

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212828

A. Guillou, P. Quinton, T. Risset

引用次数: 31

Performance-improved computation of very large word-length LNS addition/subtraction using signed-digit arithmetic 性能改进的超大字长LNS加减运算使用符号数字算法

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212857

Chichyang Chen, Rui-Lin Chen

{"title":"Performance-improved computation of very large word-length LNS addition/subtraction using signed-digit arithmetic","authors":"Chichyang Chen, Rui-Lin Chen","doi":"10.1109/ASAP.2003.1212857","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212857","url":null,"abstract":"Pipelined computation of very large word-length LNS addition/subtraction requires a significant amount of hardware and long pipeline latency. We propose a base-e exponential algorithm to simplify the exponential computation and to replace half of the pipeline stages by multiplication-and-accumulate operations. By using this approach, the circuit cost of the previously proposed 64 bit pipelined LNS addition/subtraction unit can be reduced by more than fifty percent. We also developed signed-digit (SD) algorithms to further enhance the performance of the LNS computation. From our analysis, the throughput of the 64 bit LNS unit can be increased by a factor of 4.62, and the pipeline latency can be reduced by a factor of seven. Furthermore, this SD approach can still save more than 50% of the table size and 27.6% of the circuit of the previously proposed LNS unit. The proposed approaches and algorithms have been verified by comprehensive simulations on the designed 32 bit SD hardware-reduced LNS unit. We have concluded that the proposed approaches can significantly improve the performance of very large word-length LNS addition/subtraction computation.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128926480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Arbitrary bit permutations in one or two cycles 在一个或两个周期内任意位排列

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212847

Z. Shi, Xiao Yang, R. Lee

{"title":"Arbitrary bit permutations in one or two cycles","authors":"Z. Shi, Xiao Yang, R. Lee","doi":"10.1109/ASAP.2003.1212847","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212847","url":null,"abstract":"Symmetric-key block ciphers encrypt data, providing data confidentiality over the public Internet. For interoperability reasons, it is desirable to support a variety of symmetric-key ciphers efficiently. We show the basic operations performed by a variety of symmetric-key cryptography algorithms. Of these basic operations, only bit permutation is very slow using existing processors, followed by integer multiplication. New instructions have been proposed recently to accelerate bit permutations in general-purpose processors, reducing the instructions needed to achieve an arbitrary n-bit permutation from O(n) to O(log(n)). However, the serial data-dependency between these log(n) permutation instructions prevents them from being executed in fewer than log(n) cycles, even on superscalar processors. Since application specific instruction processors (ASIPs) have fewer constraints on maintaining standard processor datapath and control conventions, can we achieve even faster permutations? We propose six alternative ASIP approaches to achieve arbitrary 64 bit permutations in one or two cycles, using new BFLY and IBFLY instructions. This reduction to one or two cycles is achieved without increasing the cycle time. We compare the latencies of different permutation units in a technology independent way to estimate cycle time impact. We also compare the alternative ASIP architectures and their efficiency in performing arbitrary 64 bit permutations.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125540371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Application-specific DSP architecture for fast Fourier transform 用于快速傅里叶变换的专用DSP架构

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212860

K. L. Heo, Sung M. Cho, J. H. Lee, M. Sunwoo

引用次数: 8

An architecture for a radix-4 modular pipeline fast Fourier transform 基数-4模块化管道快速傅里叶变换体系结构

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212861

A. El-Khashab, E. Swartzlander

引用次数: 13

Combined multiplication and sum-of-squares units 组合乘法和平方和单位

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212844

M. Schulte, L.P. Marquette, S. Krithivasan, E. G. Walters, C. Glossner

引用次数: 11

Instruction set extension for fast elliptic curve cryptography over binary finite fields GF(2/sup m/) 二进制有限域上快速椭圆曲线密码的指令集扩展GF(2/sup m/)

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212868

J. Großschädl, Guy-Armand Kamendje

{"title":"Instruction set extension for fast elliptic curve cryptography over binary finite fields GF(2/sup m/)","authors":"J. Großschädl, Guy-Armand Kamendje","doi":"10.1109/ASAP.2003.1212868","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212868","url":null,"abstract":"The performance of elliptic curve (EC) cryptosystems depends essentially on efficient arithmetic in the underlying finite field. Binary finite fields GF(2/sup m/) have the advantage of \"carry-free\" addition. Multiplication, on the other hand, is rather costly since polynomial arithmetic is not supported by general-purpose processors. We propose a combined hardware/software approach to overcome this problem. First, we outline that multiplication of binary polynomials can be easily integrated into a multiplier datapath for integers without significant additional hardware. Then, we present new algorithms for multiple-precision arithmetic in GF(2/sup m/) based on the availability of an instruction for single-precision multiplication of binary polynomials. The proposed hardware/software approach is considerably faster than a \"conventional\" software implementation and well suited for constrained devices like smart cards. Our experimental results show that an enhanced 16 bit RISC processor is able to generate a 191 bit ECDSA signature in less than 650 msec when the core is clocked at 5 MHz.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126846128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

A generic tool-set for SoC multiprocessor debugging and synchronization 用于SoC多处理器调试和同步的通用工具集

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212840

Andreas Wieferink, Tim Kogel, A. Nohl, A. Hoffmann

引用次数: 9

Automatic instruction set extension and utilization for embedded processors 嵌入式处理器的自动指令集扩展与利用

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212834

A. Peymandoust, L. Pozzi, P. Ienne, G. Micheli

{"title":"Automatic instruction set extension and utilization for embedded processors","authors":"A. Peymandoust, L. Pozzi, P. Ienne, G. Micheli","doi":"10.1109/ASAP.2003.1212834","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212834","url":null,"abstract":"There is a growing demand for application-specific embedded processors in system-on-a-chip designs. Current tools and design methodologies often require designers to manually specialize the processor based on an application. Moreover, the use of the new complex instructions added to the processor is often left to designers' ingenuity. We present a solution that automatically groups dataflow operations in the application software as potential new complex instructions. The set of possible instructions is then automatically used for code generation combined with high-level arithmetic optimizations using symbolic algebra. Symbolic arithmetic manipulations provide a novel and effective method for instruction selection that is necessary due to the complexity of the automatically identified instructions. We have used our methodology to automatically add new instructions to Tensilica processors for a set of examples. Our results show that our tools improve designers productivity and efficiently specialize an embedded processor for the given application such that the execution time is greatly improved.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115606791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 85

Decimal multiplication via carry-save addition 通过免进位加法进行十进制乘法

Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003 Pub Date : 2003-06-24 DOI: 10.1109/ASAP.2003.1212858

M. A. Erle, M. Schulte

{"title":"Decimal multiplication via carry-save addition","authors":"M. A. Erle, M. Schulte","doi":"10.1109/ASAP.2003.1212858","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212858","url":null,"abstract":"Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. We present two novel designs for fixed-point decimal multiplication that utilize decimal carry-save addition to reduce the critical path delay. First, a multiplier that stores a reduced number of multiplicand multiples and uses decimal carry-save addition in the iterative portion of the design is presented. Then, a second multiplier design is proposed with several notable improvements including fast generation of multiplicand multiples that do not need to be stored, the use of decimal (4:2) compressors, and a simplified decimal carry-propagate addition to produce the final product. When multiplying two n-digit operands to produce a 2n-digit product, the improved multiplier design has a worst-case latency of n+4 cycles and an initiation interval of n+1 cycles. Three data-dependent optimizations, which help reduce the multipliers' average latency, are also described. The multipliers presented can be extended to support decimal floating-point multiplication.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130744305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 169