2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines最新文献_第3页

A Programmable, Maximal Throughput Architecture for Neighborhood Image Processing 一种用于邻域图像处理的可编程、最大吞吐量架构

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.13

R. Porter, J. Frigo, M. Gokhale, C. Wolinski, François Charot, Charles Wagner

引用次数: 3

Rapid Design of Special-Purpose Pipeline Processors with FPGAs and its Application to Computational Fluid Dynamics 基于fpga的专用管道处理器快速设计及其在计算流体力学中的应用

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.60

G. Lienhart, G. M. Martinez, A. Kugel, R. Männer

引用次数: 1

Packet Switched vs. Time Multiplexed FPGA Overlay Networks 分组交换与时间复用FPGA覆盖网络

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.55

Nachiket Kapre, Nikil Mehta, Michael DeLorimier, Raphael Rubin, Henry Barnor, M. J. Wilson, M. Wrighton, A. DeHon

{"title":"Packet Switched vs. Time Multiplexed FPGA Overlay Networks","authors":"Nachiket Kapre, Nikil Mehta, Michael DeLorimier, Raphael Rubin, Henry Barnor, M. J. Wilson, M. Wrighton, A. DeHon","doi":"10.1109/FCCM.2006.55","DOIUrl":"https://doi.org/10.1109/FCCM.2006.55","url":null,"abstract":"Dedicated, spatially configured FPGA interconnect is efficient for applications that require high throughput connections between processing elements (PEs) but with a limited degree of PE interconnectivity (e.g. wiring up gates and datapaths). Applications which virtualize PEs may require a large number of distinct PE-to-PE connections (e.g. using one PE to simulate 100s of operators, each requiring input data from thousands of other operators), but with each connection having low throughput compared with the PE's operating cycle time. In these highly interconnected conditions, dedicating spatial interconnect resources for all possible connections is costly and inefficient. Alternatively, we can time share physical network resources by virtualizing interconnect links, either by statically scheduling the sharing of resources prior to runtime or by dynamically negotiating resources at runtime. We explore the tradeoffs (e.g. area, route latency, route quality) between time-multiplexed and packet-switched networks overlayed on top of commodity FPGAs. We demonstrate modular and scalable networks which operate on a Xilinx XC2V6000-4 at 166MHz. For our applications, time-multiplexed, offline scheduling offers up to a 63% performance increase over online, packet-switched scheduling for equivalent topologies. When applying designs to equivalent area, packet-switching is up to 2times faster for small area designs while time-multiplexing is up to 5times faster for larger area designs. When limited to the capacity of a XC2V6000, if all communication is known, time-multiplexed routing outperforms packet-switching; however when the active set of links drops below 40% of the potential links, packet-switched routing can outperform time-multiplexing","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127729400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 150

General Architecture for Hardware Implementation of Genetic Algorithm 遗传算法硬件实现的通用体系结构

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.43

Tatsuhiro Tachibana, Y. Murata, N. Shibata, K. Yasumoto, Minoru Ito

引用次数: 30

Efficient Hardware Generation of Random Variates with Arbitrary Distributions 具有任意分布的随机变量的高效硬件生成

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.39

David B. Thomas, W. Luk

引用次数: 19

The STAR-C Truth: Analyzing Reconfigurable Supercomputing Reliability STAR-C真相:分析可重构超级计算的可靠性

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.70

H. Quinn, D. Bhaduri, C. Teuscher, P. Graham, M. Gokhale

引用次数: 3

Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers 可重构计算机上分子动力学的硬件/软件方法

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.46

R. Scrofano, M. Gokhale, F. Trouw, V. Prasanna

引用次数: 70

Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components 基于fpga的流水线混合精度算法用于低精度元件的快速精确PDE求解

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.57

R. Strzodka, Dominik Göddeke

{"title":"Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components","authors":"R. Strzodka, Dominik Göddeke","doi":"10.1109/FCCM.2006.57","DOIUrl":"https://doi.org/10.1109/FCCM.2006.57","url":null,"abstract":"FPGAs are becoming more and more attractive for high precision scientific computations. One of the main problems in efficient resource utilization is the quadratically growing resource usage of multipliers depending on the operand size. Many research efforts have been devoted to the optimization of individual arithmetic and linear algebra operations. In this paper the authors take a higher level approach and seek to reduce the intermediate computational precision on the algorithmic level by optimizing the accuracy towards the final result of an algorithm. In our case this is the accurate solution of partial differential equations (PDEs). Using the Poisson problem as a typical PDE example the authors show that most intermediate operations can be computed with floats or even smaller formats and only very few operations (e.g. 1%) must be performed in double precision to obtain the same accuracy as a full double precision solver. Thus the FPGA can be configured with many parallel float rather than few resource hungry double operations. To achieve this, the authors adapt the general concept of mixed precision iterative refinement methods to FPGAs and develop a fully pipelined version of the conjugate gradient solver. The authors combine this solver with different iterative refinement schemes and precision combinations to obtain resource efficient mappings of the pipelined algorithm core onto the FPGA","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123480626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

A Multithreaded Soft Processor for SoPC Area Reduction 一种用于SoPC面积缩减的多线程软处理器

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.10

B. Fort, D. Capalija, Z. Vranesic, S. Brown

{"title":"A Multithreaded Soft Processor for SoPC Area Reduction","authors":"B. Fort, D. Capalija, Z. Vranesic, S. Brown","doi":"10.1109/FCCM.2006.10","DOIUrl":"https://doi.org/10.1109/FCCM.2006.10","url":null,"abstract":"The growth in size and performance of field programmable gate arrays (FPGAs) has compelled system-on-a-programmable-chip (SoPC) designers to use soft processors for controlling systems with large numbers of intellectual property (IP) blocks. Soft processors control IP blocks, which are accessed by the processor either as peripheral devices or/and by using custom instructions (CIs). In large systems, chip multiprocessors (CMPs) are used to execute many programs concurrently. When these programs require the use of the same IP blocks which are accessed as peripheral devices, they may have to stall waiting for their turn. In the case of CIs, the FPGA logic blocks that implement the CIs may have to be replicated for each processor. In both of these cases FPGA area is wasted, either by idle soft processors or the replication of CI logic blocks. This paper presents a multithreaded (MT) soft processor for area reduction in SoPC implementations. An MT processor allows multiple programs to access the same IP without the need for the logic replication or the replication of whole processors. We first designed a single-threaded processor that is instruction-set compatible to Altera's Nios II soft processor. Our processor is approximately the same size as the Nios II economy version, with equivalent performance. We augmented our processor to have 4-way interleaved multithreading capabilities. This paper compares the area usage and performance of the MT processor versus two CMP systems, using Altera's and our single-threaded processors, separately. Our results show that we can achieve an area savings of about 45% for the processor itself, in addition to the area savings due to not replicating CI logic blocks","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124426465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75

Combining Instruction Coding and Scheduling to Optimize Energy in System-on-FPGA 结合指令编码和调度优化fpga系统中的能量

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Pub Date : 2006-04-24 DOI: 10.1109/FCCM.2006.31

R. Dimond, O. Mencer, W. Luk

引用次数: 20