{"title":"A Novel Hueristic and Provable Bounds for Reconfigurable Architecture Design","authors":"Alastair M. Smith, G. Constantinides, P. Cheung","doi":"10.1109/FPL.2006.311261","DOIUrl":"https://doi.org/10.1109/FPL.2006.311261","url":null,"abstract":"This paper is concerned with the application of formal optimisation methods to the design of mixed-granularity FPGAs. In particular, the authors investigate the appropriate mix and floorplan of heterogeneous elements: multipliers, RAMs, and LUT-based logic, in order to maximise the performance of a set of DSP benchmark applications, given a fixed silicon budget. The authors extend our previous mathematical programming framework by proposing a novel set of heuristics, capable of providing upper bounds on the achievable reconfigurable-to-fixed-logic performance ratio. The results provide, for the first time, quantifications of the optimal performance/area-enhancing capability of multipliers and RAM blocks within a system context, and indicate that only a minimal performance benefit can be achieved over Virtex II by re-organising the device floorplan, when using optimal technology mapping","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113960450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"COMMA: A Communications Methodology for Dynamic Module Reconfiguration in FPGAs","authors":"S. Koh, O. Diessel","doi":"10.1109/FCCM.2006.32","DOIUrl":"https://doi.org/10.1109/FCCM.2006.32","url":null,"abstract":"On-going improvements in the scaling of FPGA device sizes and time-to-market pressures encourage the use of module-oriented design flows [3], while economic factors favour the reuse of smaller devices for high performance computational tasks. One of the core problems in proposing dynamic modular reconfiguration approaches is supporting the differing communications needs of the sequence of modules configured over time [2]. Proposals to date have not focussed on communications issues. Moreover, they have advocated the use of specific protocols [4], or they cannot be readily implemented [1], or they suffer from high overheads [5], or rely upon deprecated features such as tri-state lines [7]. In contrast, we propose a methodology for the rapid deployment of a communications infrastructure that provides the wires required by dynamic modules and allows users to implement the protocols they want. Our aim is to support new tiled dynamically reconfigurable architectures such as Virtex-4, as well as mature device families.","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124846396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating FPGA Acceleration into the Protomol Molecular Dynamics Code: Preliminary Report","authors":"Y. Gu, T. Court, M. Herbordt","doi":"10.1109/FCCM.2006.52","DOIUrl":"https://doi.org/10.1109/FCCM.2006.52","url":null,"abstract":"The authors describe a new pipeline for computing non-bonded forces and its integration into the ProtoMol molecular dynamics (MD) code. There are several innovations: a novel interpolation strategy, including use of higher order terms; coefficient generation with orthonormal functions; the introduction of \"semi-floating point\" numbering; and various issues related to system integration. As a result, we are able to model far more particle types, without relying on complex buffering, and obtain higher accuracy than previously. A two pipeline accelerator has been implemented on a 2004-era Xilinx VirtexII Pro VP70, integrated into ProtoMol, and tested with an enzyme inhibitor model having 8000 particles and 26 particle types. Despite performing all O(n) work on the host PC, as well as the data conversion and communication overhead, this implementation yields 5.5x to 15.7x speed-ups over a 2.8GHz PC (depending on whether cell lists are used), and with accuracy comparable to the serial code","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"401 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124543955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gayatri Mehta, J. Stander, Joshua M. Lucas, R. Hoare, Brady Hunsaker, A. Jones
{"title":"A Low-Energy Reconfigurable Fabric for the SuperCISC Architecture","authors":"Gayatri Mehta, J. Stander, Joshua M. Lucas, R. Hoare, Brady Hunsaker, A. Jones","doi":"10.1166/jolpe.2006.073","DOIUrl":"https://doi.org/10.1166/jolpe.2006.073","url":null,"abstract":"Hardware acceleration using field programmable gate arrays (FPGAs) has become increasingly popular for computationally intensive digital signal processing (DSP) applications. Unfortunately, while FPGAs have a reasonably tractable computer aided design (CAD) flow and performance, they have poor power characteristics when compared to direct application specific integrated circuit (ASIC) fabrication. ASICs exhibit better performance and power than FPGAs, but require complex CAD and large non-recurring engineering (NRE) costs. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. Several coarse-grained fabric architectures proposed during the last decade have been focused on performance and area-efficient architectural techniques. Even though power is becoming one of the critical design concerns for semiconductor industry, this issue has not been adequately addressed in the existing coarse-grained fabric architectures. In this paper, a low-power and high-performance hardware acceleration engine for DSP style applications is described. This reconfigurable fabric model is generic and parameterizable allowing design parameters to be adjusted within the architecture. The impact of varying different design parameters such as functional unit granularity, and multiplexer cardinality are studied for their implications on power and performance. The low-power fabric was designed to operate within the SuperCISC processor architecture designed at the University of Pittsburgh","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129691206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Optimized Finite Difference Computing Engine on FPGAs","authors":"Chuan He, Guan Qin, Mi Lu, Wei Zhao","doi":"10.1109/FCCM.2006.24","DOIUrl":"https://doi.org/10.1109/FCCM.2006.24","url":null,"abstract":"Time domain or frequency domain Finite Difference (FD) methods are one of the most popular numerical modelling techniques in the solution of scientific and engineering problems. However, these simulations are still time-consuming and cannot be used routinely except in institutes that can afford the high cost of running and maintaining supercomputers or large PC-cluster systems. In this paper, we present an efficient implementation of FPGA-based FD computing engine using acoustic wave modeling problems as an example. Instead of following the formal high-order FD expressions with standard IEEE-754 compliant floating-point multipliers and adders, we propose a new class of optimized FD schemes, whose FD coefficients are optimized to be only a few binary bits so that much fewer Logic Cell (LC) resources or on-chip multipliers are needed without deteriorating numerical accuracy criterions. Furthermore, we simplify the implementation of following floatingpoint summations by group-alignment technology. A floating-point/fixed-point hybrid accumulator with similar relative and absolute rounding errors now replaces the conventional costly floating-point adder tree.","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127968368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aman Gayasen, N. Vijaykrishnan, M. Kandemir, Arifur Rahman
{"title":"Switch Box Architectures for Three-Dimensional FPGAs","authors":"Aman Gayasen, N. Vijaykrishnan, M. Kandemir, Arifur Rahman","doi":"10.1109/FCCM.2006.66","DOIUrl":"https://doi.org/10.1109/FCCM.2006.66","url":null,"abstract":"In this paper, the authors explore six 3D switch box (SB) topologies for the case when the vias are fewer than the horizontal wires. Using detailed area and delay models, we estimate their impact on FPGA area, delay, and area-delay product. The results indicate that the area-delay product (ADP) depends heavily on the SB topology: our best SB reduces ADP by 9% compared to the subset SB","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123496531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Co-Verification Tool for a High Level Language Compiler for FPGAs","authors":"C. Ross, A. Böhm","doi":"10.1109/FCCM.2006.6","DOIUrl":"https://doi.org/10.1109/FCCM.2006.6","url":null,"abstract":"The authors have described a method of testing various implementations of co-designs generated by the SA-C compiler. Each form can be examined using co-simulation. The host code is able to communicate with a FPGA board simulated in ModelSim as if it were physical hardware. The co-simulation approach briefly described in this paper allows us to test and analyze all parts of the complete co-design. In essence, the compiler is able to perform automated co-verification for any SA-C program. At the highest level of simulation, it allows functional verification of the VHDL generated by the compiler. At the lowest level of detail, the FPGA simulation is phase accurate and mimics the hardware behavior down to the individual configurable logic block","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131305796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A case study in porting a production scientific supercomputing application to a reconfigurable computer","authors":"V. Kindratenko, D. Pointer","doi":"10.1109/FCCM.2006.5","DOIUrl":"https://doi.org/10.1109/FCCM.2006.5","url":null,"abstract":"This case study presents the results of porting a production scientific code, called NAMD, to the SRC-6 high-performance reconfigurable computing platform based on field programmable gate array (FPGA) technology. NAMD is a molecular dynamics code designed to run on large supercomputing systems and used extensively by the computational biophysics community. NAMD's computational kernel is highly optimized to run on conventional von Neumann processors; this presents numerous challenges to its reimplementation on FPGA architecture. This paper presents an overview of the SRC-6 architecture and the NAMD application and then discusses the challenges, solutions, and results of the porting effort. The rationale in choosing the development path taken and the general framework for porting an existing scientific code, such as NAMD, to the SRC-6 platform are presented and discussed in detail. The results and methods presented in this paper are applicable to the large class of problems in scientific computing","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122951736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Jones, R. Hoare, S. Dontharaju, S. Tung, Ralph Sprang, Joshua Fazekas, J. T. Cain, M. Mickle
{"title":"A Field Programmable RFID Tag and Associated Design Flow","authors":"A. Jones, R. Hoare, S. Dontharaju, S. Tung, Ralph Sprang, Joshua Fazekas, J. T. Cain, M. Mickle","doi":"10.1109/FCCM.2006.7","DOIUrl":"https://doi.org/10.1109/FCCM.2006.7","url":null,"abstract":"Current radio frequency identification (RFID) systems generally have long design times and low tolerance to changes in specification. This paper describes a field programmable, low-power active RFID tag, and its associated specification and automated design flow. RFID primitives to be supported by the tag are enumerated with RFID macros, or assembly-like descriptions of the tag operations. From these, the RFID preprocessor generates templates automatically. The behavior of each RFID primitive is specified using ANSI C in the template. The resulting file is compiled by the RFID compiler. A smart buffer sits between the transceiver and the tag controller, to detect whether incoming packets are intended for the tag. By doing so, the main controller may remain powered down to reduce power consumption. Two system-on-a-chip implementation strategies are presented. First, a microprocessor based system for which a C program is automatically generated. The second includes a block of low-power FPGA logic. The user supplied RFID logic in ANSI-C is automatically converted into combinational VHDL by the RFID compiler. Based on a test program, the processors required 183, 43, and 19 muJ per transaction for StrongARM, XScale, and EISC processors, respectively. By replacing the processor with a Coolrunner II, the controller can be reduced to 1.11 nJ per transaction","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124973135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Highly Efficient String Matching Circuit for IDS with FPGA","authors":"T. Katashita, A. Maeda, K. Toda, Y. Yamaguchi","doi":"10.1109/FCCM.2006.51","DOIUrl":"https://doi.org/10.1109/FCCM.2006.51","url":null,"abstract":"String matching circuits have been studied extensively for intrusion detection systems so far. An NFA-based string matching circuit, one of the works, has expandability of processing data width. However the resource requirement increases markedly, it was difficult to implement an NFA-based string matching circuit with whole the Snort 2.3.3 rule (35461 characters) that processes at 10 Gbps on a single FPGA. In this paper, the authors propose a highly efficient string matching circuit for FPGA. In our circuit, redundant AND-gates and states in the NFA are eliminated to reduce the resource requirement. Consequently, our circuit is reduced in the resources requirement by over 50% as compared with a previous NFA-based circuit, and the synthesis result shows that a string matching circuit that includes the whole Snort 2.3.3 rule can be implemented onto a single xc2vp-100-6 FPGA with throughput over 10 Gbps","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122137784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}