2008 International Conference on Application-Specific Systems, Architectures and Processors最新文献_第2页

Concurrent systolic architecture for high-throughput implementation of 3-dimensional discrete wavelet transform 三维离散小波变换高吞吐量实现的并发收缩结构

2008 International Conference on Application-Specific Systems, Architectures and Processors Pub Date : 2008-07-02 DOI: 10.1109/ASAP.2008.4580172

B. K. Mohanty, P. Meher

引用次数: 3

Low-cost implementations of NTRU for pervasive security 用于普遍安全性的NTRU的低成本实现

2008 International Conference on Application-Specific Systems, Architectures and Processors Pub Date : 2008-07-02 DOI: 10.1109/ASAP.2008.4580158

A. C. Atici, L. Batina, Junfeng Fan, I. Verbauwhede, S. Yalcin

引用次数: 44

Configurable and scalable high throughput turbo decoder architecture for multiple 4G wireless standards 可配置和可扩展的高吞吐量涡轮解码器架构，适用于多种4G无线标准

2008 International Conference on Application-Specific Systems, Architectures and Processors Pub Date : 2008-07-02 DOI: 10.1109/ASAP.2008.4580180

Yang Sun, Yuming Zhu, M. Goel, Joseph R. Cavallaro

{"title":"Configurable and scalable high throughput turbo decoder architecture for multiple 4G wireless standards","authors":"Yang Sun, Yuming Zhu, M. Goel, Joseph R. Cavallaro","doi":"10.1109/ASAP.2008.4580180","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580180","url":null,"abstract":"In this paper, we propose a novel multi-code turbo decoder architecture for 4G wireless systems. To support various 4G standards, a configurable multi-mode MAP (maximum a posteriori) decoder is designed for both binary and duo-binary turbo codes with small resource overhead (less than 10%) compared to the single-mode architecture. To achieve high data rates in 4G, we present a parallel turbo decoder architecture with scalable parallelism tailored to the given throughput requirements. High-level parallelism is achieved by employing contention-free interleavers. Multi-banked memory structure and routing network among memories and MAP decoders are designed to operate at full speed with parallel interleavers. We designed a very low-complexity recursive on-line address generator supporting multiple interleaving patterns, which avoids the interleaver address memory. Design trade-offs in terms of area and power efficiency are explored to find the optimal architectures. A 711 Mbps data rate is feasible with 32 Radix-4 MAP decoders running at 200 MHz clock rate.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116680236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75

Synthesis of application accelerators on Runtime Reconfigurable Hardware 运行时可重构硬件上应用加速器的综合

2008 International Conference on Application-Specific Systems, Architectures and Processors Pub Date : 2008-07-02 DOI: 10.1109/ASAP.2008.4580147

M. Alle, Keshavan Varadarajan, R. Ramesh, Joseph Nimmy, Alexander Fell, Adarsha Rao, S. Nandy, R. Narayan

{"title":"Synthesis of application accelerators on Runtime Reconfigurable Hardware","authors":"M. Alle, Keshavan Varadarajan, R. Ramesh, Joseph Nimmy, Alexander Fell, Adarsha Rao, S. Nandy, R. Narayan","doi":"10.1109/ASAP.2008.4580147","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580147","url":null,"abstract":"Application accelerators are predominantly ASICs. The cost of ASIC solutions are order of magnitudes higher than programmable processing cores. Despite this, ASIC solutions are preferred when both high performance and low power is the target. ASICs offer no flexibility in terms of it being able to cater to application derivatives, unless this has been provisioned for at the time of design. In this paper we define the architecture of Runtime Reconfigurable Hardware (RRH) as the platform for application acceleration. The proposed RRH is a homogeneous fabric comprising computing, storage and communicating resources. We also propose a synthesis methodology to realize application written a high level language (HLL) on the RRH. Applications described in HLL is compiled into application substructures. For each application substructure a set of Compute Elements interconnected in a manner that closely matches the communication pattern within it, is allocated. CEs in such a configuration is called a hardware affine. Hardware Affines are carved out on the RRH at runtime. These hardware affines are defined at compile time, and are provisioned at runtime on the fabric. By virtue of the fact that these hardware affines are NOT instruction set processor cores or Logic Elements as in FPGAs, we bear the performance and power advantage of an ASIC, and the hardware reconfigurability/programmability of that of an FPGA/Instruction Set Processor.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"15 7-8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132844933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

An efficient digital circuit for implementing Sequence Alignment algorithm in an extended processor 一种在扩展处理器上实现序列比对算法的高效数字电路

2008 International Conference on Application-Specific Systems, Architectures and Processors Pub Date : 2008-07-02 DOI: 10.1109/ASAP.2008.4580171

V. Kundeti, Yunsi Fei, S. Rajasekaran

{"title":"An efficient digital circuit for implementing Sequence Alignment algorithm in an extended processor","authors":"V. Kundeti, Yunsi Fei, S. Rajasekaran","doi":"10.1109/ASAP.2008.4580171","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580171","url":null,"abstract":"The problem of sequence alignment (Edit Distance) between a pair of strings has been well studied in the field of computing algorithms. The classic dynamic programming-based algorithm, Needleman-Wunsch (O(n2)), has been widely used in practice, especially by biologists to find similarities between gene sequences. Any optimization in the implementation of this algorithm will have a significant practical impact on biological research. However, within the past several decades, not much has been done in improving the runtime of the algorithm in real implementations. Although algorithms based on systolic processor arrays and FPGAs were presented earlier to create custom hardware to aid in speed-up, their usage has been very limited due to their inherent synchronous design complexity and scalability issues. In view of this, we propose an efficient hardware implementation of the Sequence Alignment algorithm. We provide a simple and efficient asynchronous sequential design which can be readily implemented as an instruction in an extensible processor. Experimental results show that our circuit implementation can achieve a speed-up of 3.77X on average compared with the software counterpart, meanwhile reducing the area cost.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"798 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131600568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Lightweight DMA management mechanisms for multiprocessors on FPGA FPGA上多处理器的轻量级DMA管理机制

2008 International Conference on Application-Specific Systems, Architectures and Processors Pub Date : 2008-07-02 DOI: 10.1109/ASAP.2008.4580191

Antonino Tumeo, M. Monchiero, G. Palermo, Fabrizio Ferrandi, D. Sciuto

引用次数: 13

On the high-throughput implementation of RIPEMD-160 hash algorithm RIPEMD-160哈希算法的高吞吐量实现

2008 International Conference on Application-Specific Systems, Architectures and Processors Pub Date : 2008-07-02 DOI: 10.1109/ASAP.2008.4580159

Miroslav Knezevic, K. Sakiyama, Y. Lee, I. Verbauwhede

引用次数: 14

A new high-performance scalable dynamic interconnection for FPGA-based reconfigurable systems 基于fpga的可重构系统的新型高性能可扩展动态互连

2008 International Conference on Application-Specific Systems, Architectures and Processors Pub Date : 2008-07-02 DOI: 10.1109/ASAP.2008.4580155

S. Jovanovic, C. Tanougast, S. Weber

引用次数: 33

Memory copies in multi-level memory systems 多级存储器系统中的存储器副本

2008 International Conference on Application-Specific Systems, Architectures and Processors Pub Date : 2008-07-02 DOI: 10.1109/ASAP.2008.4580192

P. D. Langen, B. Juurlink

引用次数: 3

Run-time thread sorting to expose data-level parallelism 运行时线程排序以公开数据级并行性

2008 International Conference on Application-Specific Systems, Architectures and Processors Pub Date : 2008-07-02 DOI: 10.1109/ASAP.2008.4580154

Tirath Ramdas, G. Egan, D. Abramson, K. Baldridge

{"title":"Run-time thread sorting to expose data-level parallelism","authors":"Tirath Ramdas, G. Egan, D. Abramson, K. Baldridge","doi":"10.1109/ASAP.2008.4580154","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580154","url":null,"abstract":"We address the problem of data parallel processing for computational quantum chemistry (CQC). CQC is a computationally demanding tool to study the electronic structure of molecules. An important algorithmic component of these computations is the evaluation of Electron Repulsion Integrals (ERIs). A key problem with ERI evaluation is controlflow variation between different ERI evaluations, which can only be resolved at runtime. This causes the computation to be unsuitable for data parallel execution. However, it is observed that although there is variation between ERI evaluations, the variation is limited; in fact there are a limited number of ERI classes present within any given workload. Conceptually, it is possible to classify the ERIs into sizable sets, and execute these sets in a data parallel fashion. Practically, creating these sets is computationally expensive. We describe an architecture to perform this thread sorting, where high throughput is achieved with small associative and multiport memories. The performance of the prototype is evaluated with FPGA synthesis. We go on to envision other uses for thread sorting, in general-purpose manycore architectures.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126946796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1