2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines最新文献

筛选
英文 中文
Advanced Components in the Variable Precision Floating-Point Library 可变精度浮点库中的高级组件
Xiaojun Wang, S. Braganza, M. Leeser
{"title":"Advanced Components in the Variable Precision Floating-Point Library","authors":"Xiaojun Wang, S. Braganza, M. Leeser","doi":"10.1109/FCCM.2006.21","DOIUrl":"https://doi.org/10.1109/FCCM.2006.21","url":null,"abstract":"Optimal reconfigurable hardware implementations may require the use of arbitrary floating-point formats that do not necessarily conform to IEEE specified sizes. The authors have previously presented a variable precision floating-point library for use with reconfigurable hardware. The authors recently added three advanced components: floating-point division, floating-point square root and floating-point accumulation to our library. These advanced components use algorithms that are well suited to FPGA implementations and exhibit a good tradeoff between area, latency and throughput. The floating-point format of our library is both general and flexible. All IEEE formats, including 64-bit double-precision format, are a subset of our format. All previously published floating-point formats for reconfigurable hardware are a subset of our format as well. The generic floating-point format supported by all of our library components makes it easy and convenient to create a pipelined, custom data path with optimal bitwidth for each operation. Our library can be used to achieve more parallelism and less power dissipation than adhering to a standard format. To further increase parallelism and reduce power dissipation, our library also supports hybrid fixed and floating point operations in the same design. The division and square root designs are based on table lookup and Taylor series expansion, and make use of memories and multipliers embedded on the FPGA chip. The iterative accumulator utilizes the library addition module as well as buffering and control logic to achieve performance similar to that of the addition by itself. They are all fully pipelined designs with clock speed comparable to that of other library components to aid the designer in implementing fast, complex, pipelined designs","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127748443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Automatic Sliding Window Operation Optimization for FPGA-Based 基于fpga的自动滑动窗口操作优化
Haiqian Yu, M. Leeser
{"title":"Automatic Sliding Window Operation Optimization for FPGA-Based","authors":"Haiqian Yu, M. Leeser","doi":"10.1109/FCCM.2006.29","DOIUrl":"https://doi.org/10.1109/FCCM.2006.29","url":null,"abstract":"FPGA-based computing boards are frequently used as hardware accelerators for image processing algorithms based on sliding window operations (SWOs). SWOs are both computationally intensive and data intensive and benefit from hardware acceleration with FPGAs, especially for delay sensitive applications. The current design process requires that, for each specific application using SWOs with different size of window, image, etc.; a detail design must be completed before a realistic estimate of the achievable speedup can be obtained. We present an automated tool, sliding window operation optimization (SWOOP), that generates the estimate of speedup for a high performance design before detailed implementation is complete. The achievable speedup is determined by the area of the FPGA, or, more often, the memory bandwidth to the processing elements. The memory bandwidth to each processing element is a combination of bandwidth to the FPGA and the efficient use of on-chip RAM as a data cache. SWOOP uses analytic techniques to automatically determine the number of parallel processing elements to implement on the FPGA, the assignment of input and output data to on-board memory, and the organization of data in on-chip memory to most effectively keep the processing elements busy. The result is a block layout of the final design, its memory architecture, and a measure of the achievable speedup. The results, compared to manual designs, show that the estimates obtained usinq SWOOP are very accurate","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"11221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114132067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
FPGAs, GPUs and the PS2 - A Single Programming Methodology fpga, gpu和PS2 -单一编程方法
Lee W. Howes, P. Price, O. Mencer, Olav Beckmann
{"title":"FPGAs, GPUs and the PS2 - A Single Programming Methodology","authors":"Lee W. Howes, P. Price, O. Mencer, Olav Beckmann","doi":"10.1109/FCCM.2006.42","DOIUrl":"https://doi.org/10.1109/FCCM.2006.42","url":null,"abstract":"Field programmable gate arrays (FPGAs), graphics processing units (GPUs) and Sony's Playstation 2 vector units offer scope for hardware acceleration of applications. Implementing algorithms on multiple architectures can be a long and complicated process. We demonstrate an approach to compiling for FPGAs, GPUs and PS2 vector units using a unified description based on A Stream Compiler (ASC) for FPGAs. As an example of its use we implement a Monte Carlo simulation using ASC. The unified description allows us to evaluate optimisations for specific architectures on top of a single base description, saving time and effort","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121975595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Design of a Reconfigurable Processor for NIST Prime Field ECC 用于NIST主域ECC的可重构处理器设计
Kendall Ananyi, Daler N. Rakhmatov
{"title":"Design of a Reconfigurable Processor for NIST Prime Field ECC","authors":"Kendall Ananyi, Daler N. Rakhmatov","doi":"10.1109/FCCM.2006.36","DOIUrl":"https://doi.org/10.1109/FCCM.2006.36","url":null,"abstract":"This paper describes a reconfigurable processor that provides support for basic elliptic curve cryptographic (ECC) operations over GF(p), such as modular addition, subtraction, multiplication, and inversion. The proposed processor can be configured for any of the five NIST primes with sizes ranging from 192 to 521 bits","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116471479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Open Source High Performance Floating-Point Modules 开源高性能浮点模块
K. Hemmert, K. Underwood
{"title":"Open Source High Performance Floating-Point Modules","authors":"K. Hemmert, K. Underwood","doi":"10.1109/FCCM.2006.54","DOIUrl":"https://doi.org/10.1109/FCCM.2006.54","url":null,"abstract":"Given the logic density of modern FPGAs, it is feasible to use FPGAs for floating-point applications. However, it is important that any floating-point units that are used be highly optimized. This paper introduces an open source library of highly optimized floating-point units for Xilinx FPGAs. The units are fully IEEE compliant and acheive approximately 230 MHz operation frequency for double-precision add and multiply in a Xilinx Virtex-2-Pro FPGA (-7 speed grade). This speed is acheived with a 10 stage adder pipeline and a 12 stage multiplier pipeline. The area requirement is 571 slices for the adder and 905 slices for the multiplier","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114490199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
A Scalable Hybrid Regular Expression Pattern Matcher 一个可伸缩的混合正则表达式模式匹配器
J. Moscola, Young-Hee Cho, J. Lockwood
{"title":"A Scalable Hybrid Regular Expression Pattern Matcher","authors":"J. Moscola, Young-Hee Cho, J. Lockwood","doi":"10.1109/FCCM.2006.18","DOIUrl":"https://doi.org/10.1109/FCCM.2006.18","url":null,"abstract":"In this paper, the authors present a reconfigurable hardware architecture for searching for regular expression patterns in streaming data. This new architecture is created by combining two popular pattern matching techniques: a pipelined character grid architecture (Baker, 2004), and a regular expression NFA architecture (Cho, 2006). The resulting hybrid architecture can scale the number of input characters while still maintaining the ability to scan for regular expression patterns","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115037823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Task Graph Approach for Efficient Exploitation of Reconfiguration in Dynamically Reconfigurable Systems 动态可重构系统中有效利用可重构的任务图方法
Kyprianos Papademetriou, A. Dollas
{"title":"A Task Graph Approach for Efficient Exploitation of Reconfiguration in Dynamically Reconfigurable Systems","authors":"Kyprianos Papademetriou, A. Dollas","doi":"10.1109/FCCM.2006.19","DOIUrl":"https://doi.org/10.1109/FCCM.2006.19","url":null,"abstract":"Partial reconfiguration suffers from the inherent high latency and low throughput which is more considerable when reconfiguration is performed on-demand. This work deals with this overhead in processors combining a fixed processing unit (FPU), and a reconfigurable processing unit (RPU). Static and dynamic prefetching (Li, 2002), and instruction forecasting (Iliopoulos and Antonakopoulos, 2001) are targeting at reduction of the overhead through preloading of configurations. Banerjee et al. (2005) transform the task graph of an application and a heuristic algorithm evaluates the reduction in schedule length and selects the most promising configuration. Tasks are scheduled according to the physical resource constraints. In this work the prefetching model of Li (2002) was augmented by taking into account the hardware area constraints of a partially reconfigurable system. Given the task graph of an application, tasks with low probability to be executed are split and preloaded according to the hardware in order to be fully utilized. Thus, the time during which reconfiguration is overlapped with processor execution is increased","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132638599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hierarchical Clustering using Reconfigurable Devices 使用可重构设备的分层聚类
Shobana Padmanabhan, Moshe Looks, Dan Legorreta, Young-Hee Cho, J. Lockwood
{"title":"Hierarchical Clustering using Reconfigurable Devices","authors":"Shobana Padmanabhan, Moshe Looks, Dan Legorreta, Young-Hee Cho, J. Lockwood","doi":"10.1109/FCCM.2006.49","DOIUrl":"https://doi.org/10.1109/FCCM.2006.49","url":null,"abstract":"Non-hierarchical k-means algorithms have been implemented in hardware, most frequently for image clustering. Here, we focus on hierarchical clustering of text documents based on document similarity. To our knowledge, this is the first work to present a hierarchical clustering algorithm designed for hardware implementation and ours is the first hardware-accelerated implementation","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133253826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sparse Matrix-Vector Multiplication for Finite Element Method Matrices on FPGAs fpga上有限元方法矩阵的稀疏矩阵-向量乘法
Y. El-Kurdi, W. Gross, D. Giannacopoulos
{"title":"Sparse Matrix-Vector Multiplication for Finite Element Method Matrices on FPGAs","authors":"Y. El-Kurdi, W. Gross, D. Giannacopoulos","doi":"10.1109/FCCM.2006.65","DOIUrl":"https://doi.org/10.1109/FCCM.2006.65","url":null,"abstract":"The paper presents an architecture and an implementation of an FPGA-based sparse matrix-vector multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations arising from finite element method (FEM) applications. The architecture is based on a pipelined linear array of processing elements (PEs). A hardware-oriented matrix \"striping\" scheme is developed which reduces the number of required processing elements. The current 8 PE prototype achieves a peak performance of 1.76 GFLOPS and a sustained performance of 1.5 GFLOPS with 8 GB/s of memory bandwidth. The SMVM-pipeline uses 30% of the logic resources and 40% of the memory resources of a Stratix S80 FPGA. By virtue of the local interconnect between the PEs, the SMVM-pipeline obtain scalability features that is only limited by FPGA resources instead of the communication overhead","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121538335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
High Performance Feature Detection on a Reconfigurable Co-Processor 基于可重构协处理器的高性能特征检测
J. Mar, A. Bissacco, Stefano Soatto, S. Ghiasi
{"title":"High Performance Feature Detection on a Reconfigurable Co-Processor","authors":"J. Mar, A. Bissacco, Stefano Soatto, S. Ghiasi","doi":"10.1109/FCCM.2006.50","DOIUrl":"https://doi.org/10.1109/FCCM.2006.50","url":null,"abstract":"In this paper, the authors propose a new design for feature detection used for tracking, which eliminates the need of a central computer to complete computations for the feature selection algorithm. Such a system constrains performance due to the delay in which data is transferred from camera to computer for processing. Our design suggests that feature detection computation can be done on a processor within the camera helping to reduce overall computation time for detection and increase performance for overall tracking system. However, these systems are often constrained by the processing power available to the camera. But with Benedetti and Perona's approach to Tomasi and Kanade's detection algorithm, such a design is possible to implement onto a camera system which would eliminate the delay and also improve performance over a tracking system designed on software","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121792746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信