2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines最新文献_第5页

Advanced Components in the Variable Precision Floating-Point Library 可变精度浮点库中的高级组件

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.21

Xiaojun Wang, S. Braganza, M. Leeser

{"title":"Advanced Components in the Variable Precision Floating-Point Library","authors":"Xiaojun Wang, S. Braganza, M. Leeser","doi":"10.1109/FCCM.2006.21","DOIUrl":"https://doi.org/10.1109/FCCM.2006.21","url":null,"abstract":"Optimal reconfigurable hardware implementations may require the use of arbitrary floating-point formats that do not necessarily conform to IEEE specified sizes. The authors have previously presented a variable precision floating-point library for use with reconfigurable hardware. The authors recently added three advanced components: floating-point division, floating-point square root and floating-point accumulation to our library. These advanced components use algorithms that are well suited to FPGA implementations and exhibit a good tradeoff between area, latency and throughput. The floating-point format of our library is both general and flexible. All IEEE formats, including 64-bit double-precision format, are a subset of our format. All previously published floating-point formats for reconfigurable hardware are a subset of our format as well. The generic floating-point format supported by all of our library components makes it easy and convenient to create a pipelined, custom data path with optimal bitwidth for each operation. Our library can be used to achieve more parallelism and less power dissipation than adhering to a standard format. To further increase parallelism and reduce power dissipation, our library also supports hybrid fixed and floating point operations in the same design. The division and square root designs are based on table lookup and Taylor series expansion, and make use of memories and multipliers embedded on the FPGA chip. The iterative accumulator utilizes the library addition module as well as buffering and control logic to achieve performance similar to that of the addition by itself. They are all fully pipelined designs with clock speed comparable to that of other library components to aid the designer in implementing fast, complex, pipelined designs","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127748443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 59

Automatic Sliding Window Operation Optimization for FPGA-Based 基于fpga的自动滑动窗口操作优化

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.29

Haiqian Yu, M. Leeser

{"title":"Automatic Sliding Window Operation Optimization for FPGA-Based","authors":"Haiqian Yu, M. Leeser","doi":"10.1109/FCCM.2006.29","DOIUrl":"https://doi.org/10.1109/FCCM.2006.29","url":null,"abstract":"FPGA-based computing boards are frequently used as hardware accelerators for image processing algorithms based on sliding window operations (SWOs). SWOs are both computationally intensive and data intensive and benefit from hardware acceleration with FPGAs, especially for delay sensitive applications. The current design process requires that, for each specific application using SWOs with different size of window, image, etc.; a detail design must be completed before a realistic estimate of the achievable speedup can be obtained. We present an automated tool, sliding window operation optimization (SWOOP), that generates the estimate of speedup for a high performance design before detailed implementation is complete. The achievable speedup is determined by the area of the FPGA, or, more often, the memory bandwidth to the processing elements. The memory bandwidth to each processing element is a combination of bandwidth to the FPGA and the efficient use of on-chip RAM as a data cache. SWOOP uses analytic techniques to automatically determine the number of parallel processing elements to implement on the FPGA, the assignment of input and output data to on-board memory, and the organization of data in on-chip memory to most effectively keep the processing elements busy. The result is a block layout of the final design, its memory architecture, and a measure of the achievable speedup. The results, compared to manual designs, show that the estimates obtained usinq SWOOP are very accurate","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"11221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114132067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

FPGAs, GPUs and the PS2 - A Single Programming Methodology fpga, gpu和PS2 -单一编程方法

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.42

Lee W. Howes, P. Price, O. Mencer, Olav Beckmann

引用次数: 4

Design of a Reconfigurable Processor for NIST Prime Field ECC 用于NIST主域ECC的可重构处理器设计

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.36

Kendall Ananyi, Daler N. Rakhmatov

引用次数: 9

Open Source High Performance Floating-Point Modules 开源高性能浮点模块

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.54

K. Hemmert, K. Underwood

引用次数: 34

A Scalable Hybrid Regular Expression Pattern Matcher 一个可伸缩的混合正则表达式模式匹配器

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.18

J. Moscola, Young-Hee Cho, J. Lockwood

引用次数: 8

A Task Graph Approach for Efficient Exploitation of Reconfiguration in Dynamically Reconfigurable Systems 动态可重构系统中有效利用可重构的任务图方法

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.19

Kyprianos Papademetriou, A. Dollas

{"title":"A Task Graph Approach for Efficient Exploitation of Reconfiguration in Dynamically Reconfigurable Systems","authors":"Kyprianos Papademetriou, A. Dollas","doi":"10.1109/FCCM.2006.19","DOIUrl":"https://doi.org/10.1109/FCCM.2006.19","url":null,"abstract":"Partial reconfiguration suffers from the inherent high latency and low throughput which is more considerable when reconfiguration is performed on-demand. This work deals with this overhead in processors combining a fixed processing unit (FPU), and a reconfigurable processing unit (RPU). Static and dynamic prefetching (Li, 2002), and instruction forecasting (Iliopoulos and Antonakopoulos, 2001) are targeting at reduction of the overhead through preloading of configurations. Banerjee et al. (2005) transform the task graph of an application and a heuristic algorithm evaluates the reduction in schedule length and selects the most promising configuration. Tasks are scheduled according to the physical resource constraints. In this work the prefetching model of Li (2002) was augmented by taking into account the hardware area constraints of a partially reconfigurable system. Given the task graph of an application, tasks with low probability to be executed are split and preloaded according to the hardware in order to be fully utilized. Thus, the time during which reconfiguration is overlapped with processor execution is increased","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132638599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Hierarchical Clustering using Reconfigurable Devices 使用可重构设备的分层聚类

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.49

Shobana Padmanabhan, Moshe Looks, Dan Legorreta, Young-Hee Cho, J. Lockwood

引用次数: 1

Sparse Matrix-Vector Multiplication for Finite Element Method Matrices on FPGAs fpga上有限元方法矩阵的稀疏矩阵-向量乘法

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.65

Y. El-Kurdi, W. Gross, D. Giannacopoulos

引用次数: 27

High Performance Feature Detection on a Reconfigurable Co-Processor 基于可重构协处理器的高性能特征检测

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.50

J. Mar, A. Bissacco, Stefano Soatto, S. Ghiasi

引用次数: 0