Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003最新文献

筛选
英文 中文
Hardware synthesis for multi-dimensional time 多维时间硬件合成
A. Guillou, P. Quinton, T. Risset
{"title":"Hardware synthesis for multi-dimensional time","authors":"A. Guillou, P. Quinton, T. Risset","doi":"10.1109/ASAP.2003.1212828","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212828","url":null,"abstract":"We introduce some basic principles for extending the classical systolic synthesis methodology to multidimensional time. Multidimensional scheduling enables complex algorithms that do not admit linear schedules to be parallelized, but it also requires the use of memories in the architecture. We explain how to obtain compatible allocation and memory functions for VLSI (or SIMD-like code) generation. We also present an original mechanism for controlling a VLSI architecture that has a multidimensional schedule. A structural VHDL code has been derived and synthesized (for implementation on FPGA platforms) using these systematic design principles. These results are preliminary steps to the hardware synthesis for multidimensional time.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121891619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Performance-improved computation of very large word-length LNS addition/subtraction using signed-digit arithmetic 性能改进的超大字长LNS加减运算使用符号数字算法
Chichyang Chen, Rui-Lin Chen
{"title":"Performance-improved computation of very large word-length LNS addition/subtraction using signed-digit arithmetic","authors":"Chichyang Chen, Rui-Lin Chen","doi":"10.1109/ASAP.2003.1212857","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212857","url":null,"abstract":"Pipelined computation of very large word-length LNS addition/subtraction requires a significant amount of hardware and long pipeline latency. We propose a base-e exponential algorithm to simplify the exponential computation and to replace half of the pipeline stages by multiplication-and-accumulate operations. By using this approach, the circuit cost of the previously proposed 64 bit pipelined LNS addition/subtraction unit can be reduced by more than fifty percent. We also developed signed-digit (SD) algorithms to further enhance the performance of the LNS computation. From our analysis, the throughput of the 64 bit LNS unit can be increased by a factor of 4.62, and the pipeline latency can be reduced by a factor of seven. Furthermore, this SD approach can still save more than 50% of the table size and 27.6% of the circuit of the previously proposed LNS unit. The proposed approaches and algorithms have been verified by comprehensive simulations on the designed 32 bit SD hardware-reduced LNS unit. We have concluded that the proposed approaches can significantly improve the performance of very large word-length LNS addition/subtraction computation.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128926480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Arbitrary bit permutations in one or two cycles 在一个或两个周期内任意位排列
Z. Shi, Xiao Yang, R. Lee
{"title":"Arbitrary bit permutations in one or two cycles","authors":"Z. Shi, Xiao Yang, R. Lee","doi":"10.1109/ASAP.2003.1212847","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212847","url":null,"abstract":"Symmetric-key block ciphers encrypt data, providing data confidentiality over the public Internet. For interoperability reasons, it is desirable to support a variety of symmetric-key ciphers efficiently. We show the basic operations performed by a variety of symmetric-key cryptography algorithms. Of these basic operations, only bit permutation is very slow using existing processors, followed by integer multiplication. New instructions have been proposed recently to accelerate bit permutations in general-purpose processors, reducing the instructions needed to achieve an arbitrary n-bit permutation from O(n) to O(log(n)). However, the serial data-dependency between these log(n) permutation instructions prevents them from being executed in fewer than log(n) cycles, even on superscalar processors. Since application specific instruction processors (ASIPs) have fewer constraints on maintaining standard processor datapath and control conventions, can we achieve even faster permutations? We propose six alternative ASIP approaches to achieve arbitrary 64 bit permutations in one or two cycles, using new BFLY and IBFLY instructions. This reduction to one or two cycles is achieved without increasing the cycle time. We compare the latencies of different permutation units in a technology independent way to estimate cycle time impact. We also compare the alternative ASIP architectures and their efficiency in performing arbitrary 64 bit permutations.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125540371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Application-specific DSP architecture for fast Fourier transform 用于快速傅里叶变换的专用DSP架构
K. L. Heo, Sung M. Cho, J. H. Lee, M. Sunwoo
{"title":"Application-specific DSP architecture for fast Fourier transform","authors":"K. L. Heo, Sung M. Cho, J. H. Lee, M. Sunwoo","doi":"10.1109/ASAP.2003.1212860","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212860","url":null,"abstract":"We present ASDSP (application-specific digital signal processor) instructions and their hardware architecture for high-speed FFT. The proposed instructions calculate a butterfly within two cycles. The proposed architecture employs a data processing unit (DPU) supporting the instructions and an FFT address generation unit (FAGU) automatically calculating the butterfly input and output data addresses. The proposed DPU has a smaller area than commercial DSP chips. Moreover, the number of FFT computation cycles is reduced by the proposed FAGU. The architecture has been modeled by the VHDL. We have used the UMC 0.25/spl square/standard cell library for logic synthesis. Performance comparisons show that the number of execution cycles is reduced over 10% and the size of the DPU decreases about 30% compared with Carmel DSP.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"797 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123006203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An architecture for a radix-4 modular pipeline fast Fourier transform 基数-4模块化管道快速傅里叶变换体系结构
A. El-Khashab, E. Swartzlander
{"title":"An architecture for a radix-4 modular pipeline fast Fourier transform","authors":"A. El-Khashab, E. Swartzlander","doi":"10.1109/ASAP.2003.1212861","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212861","url":null,"abstract":"We present a radix-4 modular pipeline architecture for computing the discrete Fourier transform (DFT). For an N-point DFT, two conventional pipeline /spl radic/N-point fast Fourier transform (FFT) modules are joined by a specialized center element. The center element contains memories, coefficient ROMs, multipliers, and control logic. Compared with a standard N-point pipeline FFT, the modular FFT significantly reduces the number of delay lines to 2/spl radic/N. Further, the coefficient storage is concentrated within the center element, thereby reducing the ROM requirement within the pipeline FFT modules. The centralized memory and address generator provide data storage and reordering. The architecture has been analyzed through simulation and compared to the conventional pipeline FFT. The throughput of a standard radix-4 pipeline FFT is maintained with a slightly higher end-to-end latency. A reduction in power is achieved because the modular pipeline exhibits N/2 bit transitions on each clock as compared to y bit transitions in the conventional pipeline.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127817252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Combined multiplication and sum-of-squares units 组合乘法和平方和单位
M. Schulte, L.P. Marquette, S. Krithivasan, E. G. Walters, C. Glossner
{"title":"Combined multiplication and sum-of-squares units","authors":"M. Schulte, L.P. Marquette, S. Krithivasan, E. G. Walters, C. Glossner","doi":"10.1109/ASAP.2003.1212844","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212844","url":null,"abstract":"Multiplication and squaring are important operations in digital signal processing and multimedia applications. We present designs for units that implement either multiplication, A/spl times/B, or sum-of-squares computations, A/sup 2/+B/sup 2/, based on an input control signal. Compared to conventional parallel multipliers, these units have a modest increase in area and delay, but allow either multiplication or sum-of-squares computations to be performed. Combined multiplication and sum-of-squares units for unsigned and two's complement operands are presented, along with integrated designs that can operate on either unsigned or two's complement operands. The designs can also be extended to work with a third accumulator operand to compute either Z+A/spl times/B or Z+A/sup 2/+B/sup 2/. Synthesis results indicate that a combined multiplication and sum-of-squares unit for 32-bit two's complement operands can be implemented with roughly 15% more area and nearly the same worst case delay as a conventional 32-bit two's complement multiplier.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127891624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Instruction set extension for fast elliptic curve cryptography over binary finite fields GF(2/sup m/) 二进制有限域上快速椭圆曲线密码的指令集扩展GF(2/sup m/)
J. Großschädl, Guy-Armand Kamendje
{"title":"Instruction set extension for fast elliptic curve cryptography over binary finite fields GF(2/sup m/)","authors":"J. Großschädl, Guy-Armand Kamendje","doi":"10.1109/ASAP.2003.1212868","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212868","url":null,"abstract":"The performance of elliptic curve (EC) cryptosystems depends essentially on efficient arithmetic in the underlying finite field. Binary finite fields GF(2/sup m/) have the advantage of \"carry-free\" addition. Multiplication, on the other hand, is rather costly since polynomial arithmetic is not supported by general-purpose processors. We propose a combined hardware/software approach to overcome this problem. First, we outline that multiplication of binary polynomials can be easily integrated into a multiplier datapath for integers without significant additional hardware. Then, we present new algorithms for multiple-precision arithmetic in GF(2/sup m/) based on the availability of an instruction for single-precision multiplication of binary polynomials. The proposed hardware/software approach is considerably faster than a \"conventional\" software implementation and well suited for constrained devices like smart cards. Our experimental results show that an enhanced 16 bit RISC processor is able to generate a 191 bit ECDSA signature in less than 650 msec when the core is clocked at 5 MHz.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126846128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
A generic tool-set for SoC multiprocessor debugging and synchronization 用于SoC多处理器调试和同步的通用工具集
Andreas Wieferink, Tim Kogel, A. Nohl, A. Hoffmann
{"title":"A generic tool-set for SoC multiprocessor debugging and synchronization","authors":"Andreas Wieferink, Tim Kogel, A. Nohl, A. Hoffmann","doi":"10.1109/ASAP.2003.1212840","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212840","url":null,"abstract":"Current and future SoC designs will contain an increasing number of programmable units. To be able to tailor and debug these processors in their system context at the highest possible overall simulation speed, we propose a methodology and the necessary tooling for a multiprocessor debugging environment which allows a flexible runtime tradeoff between observability and simulation speed. This approach has been applied on a complex SoC case study.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115362766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Automatic instruction set extension and utilization for embedded processors 嵌入式处理器的自动指令集扩展与利用
A. Peymandoust, L. Pozzi, P. Ienne, G. Micheli
{"title":"Automatic instruction set extension and utilization for embedded processors","authors":"A. Peymandoust, L. Pozzi, P. Ienne, G. Micheli","doi":"10.1109/ASAP.2003.1212834","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212834","url":null,"abstract":"There is a growing demand for application-specific embedded processors in system-on-a-chip designs. Current tools and design methodologies often require designers to manually specialize the processor based on an application. Moreover, the use of the new complex instructions added to the processor is often left to designers' ingenuity. We present a solution that automatically groups dataflow operations in the application software as potential new complex instructions. The set of possible instructions is then automatically used for code generation combined with high-level arithmetic optimizations using symbolic algebra. Symbolic arithmetic manipulations provide a novel and effective method for instruction selection that is necessary due to the complexity of the automatically identified instructions. We have used our methodology to automatically add new instructions to Tensilica processors for a set of examples. Our results show that our tools improve designers productivity and efficiently specialize an embedded processor for the given application such that the execution time is greatly improved.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115606791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
Decimal multiplication via carry-save addition 通过免进位加法进行十进制乘法
M. A. Erle, M. Schulte
{"title":"Decimal multiplication via carry-save addition","authors":"M. A. Erle, M. Schulte","doi":"10.1109/ASAP.2003.1212858","DOIUrl":"https://doi.org/10.1109/ASAP.2003.1212858","url":null,"abstract":"Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. We present two novel designs for fixed-point decimal multiplication that utilize decimal carry-save addition to reduce the critical path delay. First, a multiplier that stores a reduced number of multiplicand multiples and uses decimal carry-save addition in the iterative portion of the design is presented. Then, a second multiplier design is proposed with several notable improvements including fast generation of multiplicand multiples that do not need to be stored, the use of decimal (4:2) compressors, and a simplified decimal carry-propagate addition to produce the final product. When multiplying two n-digit operands to produce a 2n-digit product, the improved multiplier design has a worst-case latency of n+4 cycles and an initiation interval of n+1 cycles. Three data-dependent optimizations, which help reduce the multipliers' average latency, are also described. The multipliers presented can be extended to support decimal floating-point multiplication.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130744305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 169
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信