Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.最新文献

筛选
英文 中文
Common subexpression elimination involving multiple variables linear DSP synthesis 涉及多变量线性DSP合成的常见子表达式消去
Anup Hosangadi, F. Fallah, R. Kastner
{"title":"Common subexpression elimination involving multiple variables linear DSP synthesis","authors":"Anup Hosangadi, F. Fallah, R. Kastner","doi":"10.1109/ASAP.2004.10022","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10022","url":null,"abstract":"Common subexpression elimination is commonly employed to reduce the number of operations in DSP algorithms after decomposing constant multiplications into shifts and additions. Conventional optimization techniques for finding common subexpressions can optimize constant multiplications with only a single variable at a time, and hence cannot fully optimize the computations with multiple variables found in matrix form of linear systems like DCT, DFT etc. We transform these computations such that all common subexpressions involving any number of variables can be detected. We then present heuristic algorithms to select the best set of common subexpressions. Experimental results show the superiority of our technique over conventional techniques for common subexpression elimination.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121245936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
A public-key cryptographic processor for RSA and ECC RSA和ECC的公钥加密处理器
H. Eberle, N. Gura, S. C. Shantz, Vipul Gupta, L. Rarick, S. Sundaram
{"title":"A public-key cryptographic processor for RSA and ECC","authors":"H. Eberle, N. Gura, S. C. Shantz, Vipul Gupta, L. Rarick, S. Sundaram","doi":"10.1109/ASAP.2004.10015","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10015","url":null,"abstract":"We describe a general-purpose processor architecture for accelerating public-key computations on server systems that demand high performance and flexibility to accommodate large numbers of secure connections with heterogeneous clients that are likely to be limited in the set of cryptographic algorithms supported. Flexibility is achieved in that the processor supports multiple public-key cryptosystems, namely RSA, DSA, DH, and ECC, arbitrary key sizes and, in the case of ECC, arbitrary curves over fields GF(p) and GF(2/sup m/). At the core of the processor is a novel dual-field multiplier based on a modified carry-save adder (CSA) tree that supports both GF(p) and GF(2/sup m/). In the case of a 64-bit integer multiplier, the necessary modifications increase its size by a mere 5%. To efficiently schedule the multiplier, we implemented a multiply-accumulate instruction that combines several steps of a multiple-precision multiplication in a single operation: multiplication, carry propagation, and partial product accumulation. We have developed a hardware prototype of the cryptographic processor in FPGA technology. If implemented in current 1.5 GHz processor technology, the processor executes 5,265 RSA-1024 op/s and 25,756 ECC-163 op/s - the given key sizes offer comparable security strength. Looking at future security levels, performance is 786 op/s for RSA-2048 and 9,576 op/s for ECC-233.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133856732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 83
Hyper-programmable architectures for adaptable networked systems 适应网络系统的超可编程架构
G. Brebner, P. James-Roxby, Eric Keller, C. Kulkarni
{"title":"Hyper-programmable architectures for adaptable networked systems","authors":"G. Brebner, P. James-Roxby, Eric Keller, C. Kulkarni","doi":"10.1109/ASAP.2004.10037","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10037","url":null,"abstract":"We explain how modern programmable logic devices have capabilities that are well suited for them to assume a central role in the implementation of networked systems, now and in the future. To date, such devices have featured largely in ASIC substitution roles within networked systems; this usage has been highly successful, allowing faster times to market and reduced engineering costs. We argue that there are many additional opportunities for productively using these devices. The requirement is exposure of their high inherent computational concurrency matched by concurrent memory accessibility, their rich on-chip interconnectivity and their complete programmability, at a higher level of abstraction that matches the implementation needs of networked systems. We discuss specific examples supporting this view, and present a highly flexible soft platform architecture at an appropriate level of abstraction from physical devices. This may be viewed as a particularly configurable and programmable type of network processor, offering scope both for innovative networked system implementation and for new directions in networking research. In particular, it is aimed at facilitating scalable solutions, matching differently resourced programmable logic devices to differing performance and sophistication requirements of networked systems, from cheap consumer appliances to high-end network switching.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114868089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A digit-serial algorithm for the discrete logarithm modulo 2/sup k/ 离散对数模2/sup k/的数字串行算法
A. Fit-Florea, D. Matula
{"title":"A digit-serial algorithm for the discrete logarithm modulo 2/sup k/","authors":"A. Fit-Florea, D. Matula","doi":"10.1109/ASAP.2004.1","DOIUrl":"https://doi.org/10.1109/ASAP.2004.1","url":null,"abstract":"We introduce as our main result a digit-serial residue arithmetic algorithm for computing the discrete logarithm modulo 2/sup k/ (dlg). \"Digit inheritance\" is presented as a fundamental property common to the primitive operations modulo 2/sup k/ of addition, multiplication, multiplicative inverse, exponentiation and discrete logarithm. Our main algorithm computes dlg using binary arithmetic with 3 as the logarithmic base and has a critical path containing one modulo 2/sup k/ multiplication operation for each of its k iterations. Extensions of the algorithm to other logarithmic bases and computations using digits in a higher radix 2/sup r/ are also described.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123563577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Automatic synthesis of customized local memories for multicluster application accelerators 用于多集群应用加速器的自定义本地存储器的自动合成
M. Kudlur, Kevin Fan, Michael L. Chu, S. Mahlke
{"title":"Automatic synthesis of customized local memories for multicluster application accelerators","authors":"M. Kudlur, Kevin Fan, Michael L. Chu, S. Mahlke","doi":"10.1109/ASAP.2004.10030","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10030","url":null,"abstract":"Distributed local memories, or scratchpads, have been shown to effectively reduce cost and power consumption of application-specific accelerators while maintaining performance. The design of the local memory organization must take several factors into account, including the memory bandwidth and size requirements of the program and the distribution of program data among the memories. In addition, when register structures and function units in the accelerator are clustered, the effects of intercluster communication should be taken into account. This work proposes a technique to synthesize the local memory architecture of a clustered accelerator using a phase-ordered approach. First, the dataflow graph is pre-partitioned to define a performance-centric grouping of the operations. Second, memory synthesis is performed by combining multiple data structures into a set of physical memories that minimizes cost while maintaining a performance threshold. Finally, post-partitioning is performed to determine the final assignment of operations to clusters given the memory organization. Results show that customization reduces memory cost from 2% to 59% over a naive scheme that utilizes one physical memory per program data structure. Further, pre-partitioning is shown to reduce the intercluster communication required to achieve a fixed performance.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127350225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Efficient on-chip communications for data-flow IPs 有效的片上通信数据流ip
A. Fraboulet, T. Risset
{"title":"Efficient on-chip communications for data-flow IPs","authors":"A. Fraboulet, T. Risset","doi":"10.1109/ASAP.2004.10036","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10036","url":null,"abstract":"We explain a systematic way of interfacing data-flow hardware accelerators (IP) for their integration in a system on chip. We abstract the communication behaviour of the data flow IP so as to provide basis for an interface generator. We also explain which parameter this interface generator has to take into account. We validate our interface mechanism by a cycle accurate bit accurate simulation of a SoC integrating a data-flow IP.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133001060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Parallel Montgomery multipliers
M. O. Sanu, E. Swartzlander, C. Chase
{"title":"Parallel Montgomery multipliers","authors":"M. O. Sanu, E. Swartzlander, C. Chase","doi":"10.1109/ASAP.2004.10008","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10008","url":null,"abstract":"Modular multiplication is an essential operation in virtually all public-key cryptosystems in use today. This work presents four designs for speeding up modular multiplication on application-specific crypto-processors. All the approaches utilize small look-up tables and fast, massively parallel multipliers. Two of the approaches trade off smaller look-up tables for a larger, slightly slower multiplier. The other two approaches use larger look-up tables but a smaller, faster multiplier.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122538439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Improved-throughput networks of basic on-line arithmetic modules for DSP applications 用于DSP应用的基本在线算法模块的改进吞吐量网络
A. Tenca, Ajay C. Shantilal, M. Sinky
{"title":"Improved-throughput networks of basic on-line arithmetic modules for DSP applications","authors":"A. Tenca, Ajay C. Shantilal, M. Sinky","doi":"10.1109/ASAP.2004.10000","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10000","url":null,"abstract":"On-line arithmetic modules were proposed as a way to explore parallelism in arithmetic operations at digit level. By using serial operators that always receive inputs and compute the outputs from most-significant to least-significant digits, on-line arithmetic makes it possible to overlap all arithmetic operations. However, on-line arithmetic modules have an intrinsic on-line delay that impacts the maximum throughput of networks using these modules. The problem is particularly serious for small precision calculation or deep-pipelined networks. This work presents a solution to the problem. Results of an actual implementation of the solution for some basic arithmetic operators are shown and demonstrate the benefits of the proposed approach. The developed modules are also tested within the framework of an image processing application.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121620293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Register organization for enhanced on-chip parallelism 寄存器组织以增强片上并行性
R. Sangireddy
{"title":"Register organization for enhanced on-chip parallelism","authors":"R. Sangireddy","doi":"10.1109/ASAP.2004.10018","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10018","url":null,"abstract":"Large register file with multiple ports is a critical component of a high-performance processor. A large number of registers are necessary for processing a larger number of in-flight instructions to exploit higher instruction level parallelism (ILP). Multiple ports for a register file are necessary to support execution of multiple instructions each cycle. These necessities lead to a larger register access time. However, register access time has to be minimal to enable design of high frequency processors. Analysis of lifetime of a logical to physical register mapping reveals that there are long latencies between the times a physical register is allocated, consumed, and released. We propose a dual bank register file organization that exploits such long latencies, resulting in a large bandwidth with a reduced register access time. Implementation of one flavor of the proposed register file organization, as compared to a conventional monolithic register file, in an 8-wide out-of-order issue superscalar processor enhanced instructions per cycle (IPC) throughput up to 6% for Spec2000 applications while inducing register access time up to 22%. Another flavor of the register file organization, with a similar access time as the conventional monolithic register file, enhanced the IPC up to 15%. Thus a trade-off between register access time and ILP exploitation is shown.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132764747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Programming transparency and portable hardware interfacing: towards general-purpose reconfigurable computing 编程透明性和便携式硬件接口:面向通用可重构计算
M. Vuletic, L. Pozzi, P. Ienne
{"title":"Programming transparency and portable hardware interfacing: towards general-purpose reconfigurable computing","authors":"M. Vuletic, L. Pozzi, P. Ienne","doi":"10.1109/ASAP.2004.10028","DOIUrl":"https://doi.org/10.1109/ASAP.2004.10028","url":null,"abstract":"Despite enabling significant performance improvements, reconfigurable computing systems have not gained widespread acceptance: most reconfigurable computing paradigms lack (1) a unified and transparent programming model, and (2) a standard interface for integration of hardware accelerators. Ideally, programmers should code algorithms and designers should write hardware accelerators independently of any detail of the underlying platform. We argue that achieving portability and uniform programming with only limited loss of performance is one of the main issues that hinder the widespread acceptance of reconfigurable computing. To make reconfigurable computing globally more attractive, we suggest a transparent, portable, and hardware agnostic programming paradigm. For achieving software code and hardware design portability, platform-specific tasks are delegated to a system-level virtualisation layer that supports a chosen programming model-much in the same way platform details are hidden from users in general-purpose computers. Although an additional abstraction inherently brings overheads, we show that the involvement of the virtualisation layer exposes potential optimisations that compensate the overheads and bring additional speedups. As a case-study, we present a real design and implementation of a number of building blocks of such system and discuss the challenges involved in materialising the others.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116049043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信