Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.最新文献_第2页

Common subexpression elimination involving multiple variables linear DSP synthesis 涉及多变量线性DSP合成的常见子表达式消去

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10022

Anup Hosangadi, F. Fallah, R. Kastner

引用次数: 21

A public-key cryptographic processor for RSA and ECC RSA和ECC的公钥加密处理器

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10015

H. Eberle, N. Gura, S. C. Shantz, Vipul Gupta, L. Rarick, S. Sundaram

{"title":"A public-key cryptographic processor for RSA and ECC","authors":"H. Eberle, N. Gura, S. C. Shantz, Vipul Gupta, L. Rarick, S. Sundaram","doi":"10.1109/ASAP.2004.10015","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10015","url":null,"abstract":"We describe a general-purpose processor architecture for accelerating public-key computations on server systems that demand high performance and flexibility to accommodate large numbers of secure connections with heterogeneous clients that are likely to be limited in the set of cryptographic algorithms supported. Flexibility is achieved in that the processor supports multiple public-key cryptosystems, namely RSA, DSA, DH, and ECC, arbitrary key sizes and, in the case of ECC, arbitrary curves over fields GF(p) and GF(2/sup m/). At the core of the processor is a novel dual-field multiplier based on a modified carry-save adder (CSA) tree that supports both GF(p) and GF(2/sup m/). In the case of a 64-bit integer multiplier, the necessary modifications increase its size by a mere 5%. To efficiently schedule the multiplier, we implemented a multiply-accumulate instruction that combines several steps of a multiple-precision multiplication in a single operation: multiplication, carry propagation, and partial product accumulation. We have developed a hardware prototype of the cryptographic processor in FPGA technology. If implemented in current 1.5 GHz processor technology, the processor executes 5,265 RSA-1024 op/s and 25,756 ECC-163 op/s - the given key sizes offer comparable security strength. Looking at future security levels, performance is 786 op/s for RSA-2048 and 9,576 op/s for ECC-233.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133856732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 83

Hyper-programmable architectures for adaptable networked systems 适应网络系统的超可编程架构

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10037

G. Brebner, P. James-Roxby, Eric Keller, C. Kulkarni

{"title":"Hyper-programmable architectures for adaptable networked systems","authors":"G. Brebner, P. James-Roxby, Eric Keller, C. Kulkarni","doi":"10.1109/ASAP.2004.10037","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10037","url":null,"abstract":"We explain how modern programmable logic devices have capabilities that are well suited for them to assume a central role in the implementation of networked systems, now and in the future. To date, such devices have featured largely in ASIC substitution roles within networked systems; this usage has been highly successful, allowing faster times to market and reduced engineering costs. We argue that there are many additional opportunities for productively using these devices. The requirement is exposure of their high inherent computational concurrency matched by concurrent memory accessibility, their rich on-chip interconnectivity and their complete programmability, at a higher level of abstraction that matches the implementation needs of networked systems. We discuss specific examples supporting this view, and present a highly flexible soft platform architecture at an appropriate level of abstraction from physical devices. This may be viewed as a particularly configurable and programmable type of network processor, offering scope both for innovative networked system implementation and for new directions in networking research. In particular, it is aimed at facilitating scalable solutions, matching differently resourced programmable logic devices to differing performance and sophistication requirements of networked systems, from cheap consumer appliances to high-end network switching.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114868089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

A digit-serial algorithm for the discrete logarithm modulo 2/sup k/ 离散对数模2/sup k/的数字串行算法

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.1

A. Fit-Florea, D. Matula

引用次数: 10

Automatic synthesis of customized local memories for multicluster application accelerators 用于多集群应用加速器的自定义本地存储器的自动合成

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10030

M. Kudlur, Kevin Fan, Michael L. Chu, S. Mahlke

{"title":"Automatic synthesis of customized local memories for multicluster application accelerators","authors":"M. Kudlur, Kevin Fan, Michael L. Chu, S. Mahlke","doi":"10.1109/ASAP.2004.10030","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10030","url":null,"abstract":"Distributed local memories, or scratchpads, have been shown to effectively reduce cost and power consumption of application-specific accelerators while maintaining performance. The design of the local memory organization must take several factors into account, including the memory bandwidth and size requirements of the program and the distribution of program data among the memories. In addition, when register structures and function units in the accelerator are clustered, the effects of intercluster communication should be taken into account. This work proposes a technique to synthesize the local memory architecture of a clustered accelerator using a phase-ordered approach. First, the dataflow graph is pre-partitioned to define a performance-centric grouping of the operations. Second, memory synthesis is performed by combining multiple data structures into a set of physical memories that minimizes cost while maintaining a performance threshold. Finally, post-partitioning is performed to determine the final assignment of operations to clusters given the memory organization. Results show that customization reduces memory cost from 2% to 59% over a naive scheme that utilizes one physical memory per program data structure. Further, pre-partitioning is shown to reduce the intercluster communication required to achieve a fixed performance.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127350225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Efficient on-chip communications for data-flow IPs 有效的片上通信数据流ip

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10036

A. Fraboulet, T. Risset

引用次数: 11

Parallel Montgomery multipliers

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10008

M. O. Sanu, E. Swartzlander, C. Chase

引用次数: 13

Improved-throughput networks of basic on-line arithmetic modules for DSP applications 用于DSP应用的基本在线算法模块的改进吞吐量网络

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10000

A. Tenca, Ajay C. Shantilal, M. Sinky

引用次数: 4

Register organization for enhanced on-chip parallelism 寄存器组织以增强片上并行性

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10018

R. Sangireddy

{"title":"Register organization for enhanced on-chip parallelism","authors":"R. Sangireddy","doi":"10.1109/ASAP.2004.10018","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10018","url":null,"abstract":"Large register file with multiple ports is a critical component of a high-performance processor. A large number of registers are necessary for processing a larger number of in-flight instructions to exploit higher instruction level parallelism (ILP). Multiple ports for a register file are necessary to support execution of multiple instructions each cycle. These necessities lead to a larger register access time. However, register access time has to be minimal to enable design of high frequency processors. Analysis of lifetime of a logical to physical register mapping reveals that there are long latencies between the times a physical register is allocated, consumed, and released. We propose a dual bank register file organization that exploits such long latencies, resulting in a large bandwidth with a reduced register access time. Implementation of one flavor of the proposed register file organization, as compared to a conventional monolithic register file, in an 8-wide out-of-order issue superscalar processor enhanced instructions per cycle (IPC) throughput up to 6% for Spec2000 applications while inducing register access time up to 22%. Another flavor of the register file organization, with a similar access time as the conventional monolithic register file, enhanced the IPC up to 15%. Thus a trade-off between register access time and ILP exploitation is shown.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132764747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Programming transparency and portable hardware interfacing: towards general-purpose reconfigurable computing 编程透明性和便携式硬件接口:面向通用可重构计算

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI: 10.1109/ASAP.2004.10028

M. Vuletic, L. Pozzi, P. Ienne

{"title":"Programming transparency and portable hardware interfacing: towards general-purpose reconfigurable computing","authors":"M. Vuletic, L. Pozzi, P. Ienne","doi":"10.1109/ASAP.2004.10028","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10028","url":null,"abstract":"Despite enabling significant performance improvements, reconfigurable computing systems have not gained widespread acceptance: most reconfigurable computing paradigms lack (1) a unified and transparent programming model, and (2) a standard interface for integration of hardware accelerators. Ideally, programmers should code algorithms and designers should write hardware accelerators independently of any detail of the underlying platform. We argue that achieving portability and uniform programming with only limited loss of performance is one of the main issues that hinder the widespread acceptance of reconfigurable computing. To make reconfigurable computing globally more attractive, we suggest a transparent, portable, and hardware agnostic programming paradigm. For achieving software code and hardware design portability, platform-specific tasks are delegated to a system-level virtualisation layer that supports a chosen programming model-much in the same way platform details are hidden from users in general-purpose computers. Although an additional abstraction inherently brings overheads, we show that the involvement of the virtualisation layer exposes potential optimisations that compensate the overheads and bring additional speedups. As a case-study, we present a real design and implementation of a number of building blocks of such system and discuss the challenges involved in materialising the others.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116049043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17