2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines最新文献

筛选
英文 中文
A Programmable, Maximal Throughput Architecture for Neighborhood Image Processing 一种用于邻域图像处理的可编程、最大吞吐量架构
R. Porter, J. Frigo, M. Gokhale, C. Wolinski, François Charot, Charles Wagner
{"title":"A Programmable, Maximal Throughput Architecture for Neighborhood Image Processing","authors":"R. Porter, J. Frigo, M. Gokhale, C. Wolinski, François Charot, Charles Wagner","doi":"10.1109/FCCM.2006.13","DOIUrl":"https://doi.org/10.1109/FCCM.2006.13","url":null,"abstract":"The authors propose a run-time re-configurable architecture for local neighborhood image processing. Discussion of how the new architecture can offer improved flexibility to the developer. The authors show that for a satellite image feature extraction application, our architecture, implemented on Stratix II and Virtex 2 field programmable gate arrays, achieves similar performance, hardware resource utilization, and throughput as fully pipelined systolic array architecture","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125214075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Rapid Design of Special-Purpose Pipeline Processors with FPGAs and its Application to Computational Fluid Dynamics 基于fpga的专用管道处理器快速设计及其在计算流体力学中的应用
G. Lienhart, G. M. Martinez, A. Kugel, R. Männer
{"title":"Rapid Design of Special-Purpose Pipeline Processors with FPGAs and its Application to Computational Fluid Dynamics","authors":"G. Lienhart, G. M. Martinez, A. Kugel, R. Männer","doi":"10.1109/FCCM.2006.60","DOIUrl":"https://doi.org/10.1109/FCCM.2006.60","url":null,"abstract":"This paper presents a framework for rapid development of FPGA based custom processors based on floating-point calculation units. The framework consists of a fully parameterized floating-point library, an easy-to-use pipeline generator and an interface generator for memory and I/O-modules. The performance of this approach is shown for the implementation of an SPH-algorithm.","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123976389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Packet Switched vs. Time Multiplexed FPGA Overlay Networks 分组交换与时间复用FPGA覆盖网络
Nachiket Kapre, Nikil Mehta, Michael DeLorimier, Raphael Rubin, Henry Barnor, M. J. Wilson, M. Wrighton, A. DeHon
{"title":"Packet Switched vs. Time Multiplexed FPGA Overlay Networks","authors":"Nachiket Kapre, Nikil Mehta, Michael DeLorimier, Raphael Rubin, Henry Barnor, M. J. Wilson, M. Wrighton, A. DeHon","doi":"10.1109/FCCM.2006.55","DOIUrl":"https://doi.org/10.1109/FCCM.2006.55","url":null,"abstract":"Dedicated, spatially configured FPGA interconnect is efficient for applications that require high throughput connections between processing elements (PEs) but with a limited degree of PE interconnectivity (e.g. wiring up gates and datapaths). Applications which virtualize PEs may require a large number of distinct PE-to-PE connections (e.g. using one PE to simulate 100s of operators, each requiring input data from thousands of other operators), but with each connection having low throughput compared with the PE's operating cycle time. In these highly interconnected conditions, dedicating spatial interconnect resources for all possible connections is costly and inefficient. Alternatively, we can time share physical network resources by virtualizing interconnect links, either by statically scheduling the sharing of resources prior to runtime or by dynamically negotiating resources at runtime. We explore the tradeoffs (e.g. area, route latency, route quality) between time-multiplexed and packet-switched networks overlayed on top of commodity FPGAs. We demonstrate modular and scalable networks which operate on a Xilinx XC2V6000-4 at 166MHz. For our applications, time-multiplexed, offline scheduling offers up to a 63% performance increase over online, packet-switched scheduling for equivalent topologies. When applying designs to equivalent area, packet-switching is up to 2times faster for small area designs while time-multiplexing is up to 5times faster for larger area designs. When limited to the capacity of a XC2V6000, if all communication is known, time-multiplexed routing outperforms packet-switching; however when the active set of links drops below 40% of the potential links, packet-switched routing can outperform time-multiplexing","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127729400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 150
General Architecture for Hardware Implementation of Genetic Algorithm 遗传算法硬件实现的通用体系结构
Tatsuhiro Tachibana, Y. Murata, N. Shibata, K. Yasumoto, Minoru Ito
{"title":"General Architecture for Hardware Implementation of Genetic Algorithm","authors":"Tatsuhiro Tachibana, Y. Murata, N. Shibata, K. Yasumoto, Minoru Ito","doi":"10.1109/FCCM.2006.43","DOIUrl":"https://doi.org/10.1109/FCCM.2006.43","url":null,"abstract":"In this paper, the authors propose a technique to flexibly implement genetic algorithms (GAs) for various problems on FPGAs. For the purpose, the authors propose a common architecture for GA. The proposed architecture allows designers to easily implement a GA as a hardware circuit consisting of parallel pipelines which execute GA operations. The proposed architecture is scalable to increase the number of parallel pipelines. The architecture is applicable to various problems and allows designers to estimate the size of resulting circuits. The authors give a model for predicting the size of resulting circuits from given parameters. Based on the proposed method, the authors have implemented a tool to facilitate GA circuit design and development. Through experiments using knapsack problem and traveling salesman problem (TSP), the authors show that the FPGA circuits synthesized based on the proposed method run much faster and consume much lower power than software implementation on a PC and the model can predict the size of the resulting circuit accurately enough","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"951 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129663204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Efficient Hardware Generation of Random Variates with Arbitrary Distributions 具有任意分布的随机变量的高效硬件生成
David B. Thomas, W. Luk
{"title":"Efficient Hardware Generation of Random Variates with Arbitrary Distributions","authors":"David B. Thomas, W. Luk","doi":"10.1109/FCCM.2006.39","DOIUrl":"https://doi.org/10.1109/FCCM.2006.39","url":null,"abstract":"This paper presents a technique for efficiently generating random numbers from a given probability distribution. This is achieved by using a generic hardware architecture, which transforms uniform random numbers according to a distribution mapping stored in RAM, and a software approximation generator that creates distribution mappings for any given target distribution. This technique has many features not found in current non-uniform random number generators, such as the ability to adjust the target distribution while the generator is running, per-cycle switching between distributions, and the ability to generate distributions with discontinuities in the probability density function","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124311282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
The STAR-C Truth: Analyzing Reconfigurable Supercomputing Reliability STAR-C真相:分析可重构超级计算的可靠性
H. Quinn, D. Bhaduri, C. Teuscher, P. Graham, M. Gokhale
{"title":"The STAR-C Truth: Analyzing Reconfigurable Supercomputing Reliability","authors":"H. Quinn, D. Bhaduri, C. Teuscher, P. Graham, M. Gokhale","doi":"10.1109/FCCM.2006.70","DOIUrl":"https://doi.org/10.1109/FCCM.2006.70","url":null,"abstract":"In this abstract, the authors present an overview of a reliability analysis toolset, called the scalable tool for the analysis of reliable systems (STAR systems), with modules for determining the reliability of FPGA designs (STAR-circuits) and reconfigurable supercomputers (STAR-reconfigurable supercomputers","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115404960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers 可重构计算机上分子动力学的硬件/软件方法
R. Scrofano, M. Gokhale, F. Trouw, V. Prasanna
{"title":"Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers","authors":"R. Scrofano, M. Gokhale, F. Trouw, V. Prasanna","doi":"10.1109/FCCM.2006.46","DOIUrl":"https://doi.org/10.1109/FCCM.2006.46","url":null,"abstract":"With advances in re configurable hardware, especially field-programmable gate arrays (FPGAs), it has become possible to use reconfigurable hardware to accelerate complex applications, such as those in scientific computing. There has been a resulting development of reconfigurable computers - computers which have both general purpose processors and reconfigurable hardware, as well as memory and high-performance interconnection networks. In this paper, we study the acceleration of molecular dynamics simulations using reconfigurable computers. We describe how we partition the application between software and hardware and then model the performance of several alternatives for the task mapped to hardware. We describe an implementation of one of these alternatives on a reconfigurable computer and demonstrate that for two real-world simulations, it achieves a 2 times speed-up over the software baseline. We then compare our design and results to those of prior efforts and explain the advantages of the hardware/software approach, including flexibility","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125397939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components 基于fpga的流水线混合精度算法用于低精度元件的快速精确PDE求解
R. Strzodka, Dominik Göddeke
{"title":"Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components","authors":"R. Strzodka, Dominik Göddeke","doi":"10.1109/FCCM.2006.57","DOIUrl":"https://doi.org/10.1109/FCCM.2006.57","url":null,"abstract":"FPGAs are becoming more and more attractive for high precision scientific computations. One of the main problems in efficient resource utilization is the quadratically growing resource usage of multipliers depending on the operand size. Many research efforts have been devoted to the optimization of individual arithmetic and linear algebra operations. In this paper the authors take a higher level approach and seek to reduce the intermediate computational precision on the algorithmic level by optimizing the accuracy towards the final result of an algorithm. In our case this is the accurate solution of partial differential equations (PDEs). Using the Poisson problem as a typical PDE example the authors show that most intermediate operations can be computed with floats or even smaller formats and only very few operations (e.g. 1%) must be performed in double precision to obtain the same accuracy as a full double precision solver. Thus the FPGA can be configured with many parallel float rather than few resource hungry double operations. To achieve this, the authors adapt the general concept of mixed precision iterative refinement methods to FPGAs and develop a fully pipelined version of the conjugate gradient solver. The authors combine this solver with different iterative refinement schemes and precision combinations to obtain resource efficient mappings of the pipelined algorithm core onto the FPGA","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123480626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
A Multithreaded Soft Processor for SoPC Area Reduction 一种用于SoPC面积缩减的多线程软处理器
B. Fort, D. Capalija, Z. Vranesic, S. Brown
{"title":"A Multithreaded Soft Processor for SoPC Area Reduction","authors":"B. Fort, D. Capalija, Z. Vranesic, S. Brown","doi":"10.1109/FCCM.2006.10","DOIUrl":"https://doi.org/10.1109/FCCM.2006.10","url":null,"abstract":"The growth in size and performance of field programmable gate arrays (FPGAs) has compelled system-on-a-programmable-chip (SoPC) designers to use soft processors for controlling systems with large numbers of intellectual property (IP) blocks. Soft processors control IP blocks, which are accessed by the processor either as peripheral devices or/and by using custom instructions (CIs). In large systems, chip multiprocessors (CMPs) are used to execute many programs concurrently. When these programs require the use of the same IP blocks which are accessed as peripheral devices, they may have to stall waiting for their turn. In the case of CIs, the FPGA logic blocks that implement the CIs may have to be replicated for each processor. In both of these cases FPGA area is wasted, either by idle soft processors or the replication of CI logic blocks. This paper presents a multithreaded (MT) soft processor for area reduction in SoPC implementations. An MT processor allows multiple programs to access the same IP without the need for the logic replication or the replication of whole processors. We first designed a single-threaded processor that is instruction-set compatible to Altera's Nios II soft processor. Our processor is approximately the same size as the Nios II economy version, with equivalent performance. We augmented our processor to have 4-way interleaved multithreading capabilities. This paper compares the area usage and performance of the MT processor versus two CMP systems, using Altera's and our single-threaded processors, separately. Our results show that we can achieve an area savings of about 45% for the processor itself, in addition to the area savings due to not replicating CI logic blocks","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124426465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Combining Instruction Coding and Scheduling to Optimize Energy in System-on-FPGA 结合指令编码和调度优化fpga系统中的能量
R. Dimond, O. Mencer, W. Luk
{"title":"Combining Instruction Coding and Scheduling to Optimize Energy in System-on-FPGA","authors":"R. Dimond, O. Mencer, W. Luk","doi":"10.1109/FCCM.2006.31","DOIUrl":"https://doi.org/10.1109/FCCM.2006.31","url":null,"abstract":"In this paper, we investigate a combination of two techniques n struction coding and instruction re-ordering - for optimizing energy in embedded processor control. We present the first practical, hardware implementation incorporating both approaches as part of a novel flow for automatic power-optimization of an FPGA soft processor. Our infrastructure generates customized processors and associated software, to enable power optimizations to be evaluated on multiple architectures and FPGA platforms. We evaluate using both software estimates of power and actual measurements from both low-cost and high-performance FPGAs. We generate over 150 optimized processor designs for two FPGA platforms, two processor architectures and six different benchmarks at four different clock rates and achieve consistent measured dynamic power reduction of up to 74%, without performance cost. Our results are applicable beyond processor optimization, quantifying the benefits of practical switching reduction and highlighting non-obvious pitfalls and complexities in dynamic power optimization","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124753448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信