2008 International Conference on Application-Specific Systems, Architectures and Processors最新文献

筛选
英文 中文
Concurrent systolic architecture for high-throughput implementation of 3-dimensional discrete wavelet transform 三维离散小波变换高吞吐量实现的并发收缩结构
B. K. Mohanty, P. Meher
{"title":"Concurrent systolic architecture for high-throughput implementation of 3-dimensional discrete wavelet transform","authors":"B. K. Mohanty, P. Meher","doi":"10.1109/ASAP.2008.4580172","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580172","url":null,"abstract":"In this paper, we present a novel systolic architecture for high-throughput computation of 3-dimensional (3-D) discrete wavelet transform (DWT). The entire 3-D DWT computation is decomposed into three distinct stages and implemented concurrently in a linear array of fully pipelined processing elements (PE). The proposed structure for 3-D DWT provides higher throughput than the existing architecture; and involves nearly half or less the number of multipliers and adders; and less on-chip memory (when normalized for unit throughput rate) than the other. Most importantly, the proposed one does not require any frame buffer unlike the other to perform inter-frame DWT computation. The proposed structure has a small latency and can perform 3-D DWT computation with 100% hardware unitization efficiency.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133243548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Low-cost implementations of NTRU for pervasive security 用于普遍安全性的NTRU的低成本实现
A. C. Atici, L. Batina, Junfeng Fan, I. Verbauwhede, S. Yalcin
{"title":"Low-cost implementations of NTRU for pervasive security","authors":"A. C. Atici, L. Batina, Junfeng Fan, I. Verbauwhede, S. Yalcin","doi":"10.1109/ASAP.2008.4580158","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580158","url":null,"abstract":"NTRU is a public-key cryptosystem based on the shortest vector problem in a lattice which is an alternative to RSA and ECC. This work presents a compact and low power NTRU design that is suitable for pervasive security applications such as RFIDs and sensor nodes. We have designed two architectures, one is only capable of encryption and the other one performs both encryption and decryption. The strategy for the designs includes clock gating of registers, operand isolation and precomputation. This work is also the first one to present a complete NTRU design with encryption/decryption circuitry. Our encryption-only NTRU design has a gate-count of 2.8 kgates and dynamic power consumption of 1.72 muW. Moreover, encryption-decryption NTRU design consumes about 6 muW dynamic power and consists of 10.5 kgates.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121863353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Configurable and scalable high throughput turbo decoder architecture for multiple 4G wireless standards 可配置和可扩展的高吞吐量涡轮解码器架构,适用于多种4G无线标准
Yang Sun, Yuming Zhu, M. Goel, Joseph R. Cavallaro
{"title":"Configurable and scalable high throughput turbo decoder architecture for multiple 4G wireless standards","authors":"Yang Sun, Yuming Zhu, M. Goel, Joseph R. Cavallaro","doi":"10.1109/ASAP.2008.4580180","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580180","url":null,"abstract":"In this paper, we propose a novel multi-code turbo decoder architecture for 4G wireless systems. To support various 4G standards, a configurable multi-mode MAP (maximum a posteriori) decoder is designed for both binary and duo-binary turbo codes with small resource overhead (less than 10%) compared to the single-mode architecture. To achieve high data rates in 4G, we present a parallel turbo decoder architecture with scalable parallelism tailored to the given throughput requirements. High-level parallelism is achieved by employing contention-free interleavers. Multi-banked memory structure and routing network among memories and MAP decoders are designed to operate at full speed with parallel interleavers. We designed a very low-complexity recursive on-line address generator supporting multiple interleaving patterns, which avoids the interleaver address memory. Design trade-offs in terms of area and power efficiency are explored to find the optimal architectures. A 711 Mbps data rate is feasible with 32 Radix-4 MAP decoders running at 200 MHz clock rate.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116680236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Synthesis of application accelerators on Runtime Reconfigurable Hardware 运行时可重构硬件上应用加速器的综合
M. Alle, Keshavan Varadarajan, R. Ramesh, Joseph Nimmy, Alexander Fell, Adarsha Rao, S. Nandy, R. Narayan
{"title":"Synthesis of application accelerators on Runtime Reconfigurable Hardware","authors":"M. Alle, Keshavan Varadarajan, R. Ramesh, Joseph Nimmy, Alexander Fell, Adarsha Rao, S. Nandy, R. Narayan","doi":"10.1109/ASAP.2008.4580147","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580147","url":null,"abstract":"Application accelerators are predominantly ASICs. The cost of ASIC solutions are order of magnitudes higher than programmable processing cores. Despite this, ASIC solutions are preferred when both high performance and low power is the target. ASICs offer no flexibility in terms of it being able to cater to application derivatives, unless this has been provisioned for at the time of design. In this paper we define the architecture of Runtime Reconfigurable Hardware (RRH) as the platform for application acceleration. The proposed RRH is a homogeneous fabric comprising computing, storage and communicating resources. We also propose a synthesis methodology to realize application written a high level language (HLL) on the RRH. Applications described in HLL is compiled into application substructures. For each application substructure a set of Compute Elements interconnected in a manner that closely matches the communication pattern within it, is allocated. CEs in such a configuration is called a hardware affine. Hardware Affines are carved out on the RRH at runtime. These hardware affines are defined at compile time, and are provisioned at runtime on the fabric. By virtue of the fact that these hardware affines are NOT instruction set processor cores or Logic Elements as in FPGAs, we bear the performance and power advantage of an ASIC, and the hardware reconfigurability/programmability of that of an FPGA/Instruction Set Processor.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"15 7-8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132844933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
An efficient digital circuit for implementing Sequence Alignment algorithm in an extended processor 一种在扩展处理器上实现序列比对算法的高效数字电路
V. Kundeti, Yunsi Fei, S. Rajasekaran
{"title":"An efficient digital circuit for implementing Sequence Alignment algorithm in an extended processor","authors":"V. Kundeti, Yunsi Fei, S. Rajasekaran","doi":"10.1109/ASAP.2008.4580171","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580171","url":null,"abstract":"The problem of sequence alignment (Edit Distance) between a pair of strings has been well studied in the field of computing algorithms. The classic dynamic programming-based algorithm, Needleman-Wunsch (O(n2)), has been widely used in practice, especially by biologists to find similarities between gene sequences. Any optimization in the implementation of this algorithm will have a significant practical impact on biological research. However, within the past several decades, not much has been done in improving the runtime of the algorithm in real implementations. Although algorithms based on systolic processor arrays and FPGAs were presented earlier to create custom hardware to aid in speed-up, their usage has been very limited due to their inherent synchronous design complexity and scalability issues. In view of this, we propose an efficient hardware implementation of the Sequence Alignment algorithm. We provide a simple and efficient asynchronous sequential design which can be readily implemented as an instruction in an extensible processor. Experimental results show that our circuit implementation can achieve a speed-up of 3.77X on average compared with the software counterpart, meanwhile reducing the area cost.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"798 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131600568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Lightweight DMA management mechanisms for multiprocessors on FPGA FPGA上多处理器的轻量级DMA管理机制
Antonino Tumeo, M. Monchiero, G. Palermo, Fabrizio Ferrandi, D. Sciuto
{"title":"Lightweight DMA management mechanisms for multiprocessors on FPGA","authors":"Antonino Tumeo, M. Monchiero, G. Palermo, Fabrizio Ferrandi, D. Sciuto","doi":"10.1109/ASAP.2008.4580191","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580191","url":null,"abstract":"This paper presents a multiprocessor system on FPGA that adopts Direct Memory Access (DMA) mechanisms to move data between the external memory and the local memory of each processor. The system integrates all standard DMA primitives via a fast Application Programming Interface (API) and relies on interrupts having also the possibility to manage a command list. This interface allows to program the embedded multiprocessor architecture on FPGA with simple DMAs using the same DMA techniques adopted on high performance multiprocessors with complex DMA controllers. Several experiments demonstrate the performance of our solution, allowing 57% improvement on the execution time of a selected set of benchmarks. We furthermore show how some DMA programming techniques (double and multi-buffering) can be effectively used within our platform, thus easing the design and development of the hardware and the software in a reconfigurable DMA-based environment.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"60 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114023770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
On the high-throughput implementation of RIPEMD-160 hash algorithm RIPEMD-160哈希算法的高吞吐量实现
Miroslav Knezevic, K. Sakiyama, Y. Lee, I. Verbauwhede
{"title":"On the high-throughput implementation of RIPEMD-160 hash algorithm","authors":"Miroslav Knezevic, K. Sakiyama, Y. Lee, I. Verbauwhede","doi":"10.1109/ASAP.2008.4580159","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580159","url":null,"abstract":"In this paper we present two new architectures of the RIPEMD-160 hash algorithm for high throughput implementations. The first architecture achieves the iteration bound of RIPEMD-160, i.e. it achieves a theoretical upper bound on throughput at the micro-architecture level. The second architecture is designed by performing a gate level optimization and achieves a better performance than the first one at the cost of a larger gate area. Throughputs of 3.122 Gbps and 624 Mbps are achieved, with and without pipelining, respectively.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124100888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A new high-performance scalable dynamic interconnection for FPGA-based reconfigurable systems 基于fpga的可重构系统的新型高性能可扩展动态互连
S. Jovanovic, C. Tanougast, S. Weber
{"title":"A new high-performance scalable dynamic interconnection for FPGA-based reconfigurable systems","authors":"S. Jovanovic, C. Tanougast, S. Weber","doi":"10.1109/ASAP.2008.4580155","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580155","url":null,"abstract":"Networks on chip (NoCs) present viable interconnection architectures which are especially characterized by high level of parallelism, high performances and scalability. The already proposed NoC architectures in literature are mostly destined to system-on-chip (SoCs) designs. For a FPGA-based reconfigurable system, the proposed NoCs are not suitable. In this paper, we present a new high-performance interconnection approach destined for FPGA-based reconfigurable system. Our proposed NoC is based on a scalable communication unit characterized by its particularly architecture, an arbitration policy based on the priority-to-the-right rule and high performances. We present the basic concept of this communication approach and we prove its feasibility on examples through the simulations. Implementation results are also detailed.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129075826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Memory copies in multi-level memory systems 多级存储器系统中的存储器副本
P. D. Langen, B. Juurlink
{"title":"Memory copies in multi-level memory systems","authors":"P. D. Langen, B. Juurlink","doi":"10.1109/ASAP.2008.4580192","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580192","url":null,"abstract":"Data movement operations, such as the C-style memcpy function, are often used to duplicate or communicate data. This type of function typically produces a significant amount of off-chip traffic. For current microprocessors, communication with off-chip memory is an increasing limitation to attain higher performance as well as a significant source of energy consumption. To decrease the amount of communication between a CPU and the off-chip memory system, we propose a system that implements a hardware memcpy in the memory level where the source data is located.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114572169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Run-time thread sorting to expose data-level parallelism 运行时线程排序以公开数据级并行性
Tirath Ramdas, G. Egan, D. Abramson, K. Baldridge
{"title":"Run-time thread sorting to expose data-level parallelism","authors":"Tirath Ramdas, G. Egan, D. Abramson, K. Baldridge","doi":"10.1109/ASAP.2008.4580154","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580154","url":null,"abstract":"We address the problem of data parallel processing for computational quantum chemistry (CQC). CQC is a computationally demanding tool to study the electronic structure of molecules. An important algorithmic component of these computations is the evaluation of Electron Repulsion Integrals (ERIs). A key problem with ERI evaluation is controlflow variation between different ERI evaluations, which can only be resolved at runtime. This causes the computation to be unsuitable for data parallel execution. However, it is observed that although there is variation between ERI evaluations, the variation is limited; in fact there are a limited number of ERI classes present within any given workload. Conceptually, it is possible to classify the ERIs into sizable sets, and execute these sets in a data parallel fashion. Practically, creating these sets is computationally expensive. We describe an architecture to perform this thread sorting, where high throughput is achieved with small associative and multiport memories. The performance of the prototype is evaluated with FPGA synthesis. We go on to envision other uses for thread sorting, in general-purpose manycore architectures.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126946796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信