Koray Öner, L. Barroso, S. Iman, Jaeheon Jeong, Krishnan Ramamurthy, M. Dubois
{"title":"The Design of RPM: An FPGA-based Multiprocessor Emulator","authors":"Koray Öner, L. Barroso, S. Iman, Jaeheon Jeong, Krishnan Ramamurthy, M. Dubois","doi":"10.1145/201310.201321","DOIUrl":"https://doi.org/10.1145/201310.201321","url":null,"abstract":"Recent advances in Field-Programmable Gate Arrays (FPGA) and programmable interconnects have made it possible to build efficient hardware emulation engines. In addition, improvements in Computer-Aided Design (CAD) tools, mainly in synthesis tools, greatly simplify the design of large circuits. The RPM (Rapid Prototype Engine for Multiprocessors) Project leverages these two technological advances. Its goal is to develop a common hardware platform for the emulation of multiprocessor systems with different architectures. For cost reasons, the use of FPGAs in RPM is limited to the memory controllers, while the rest of the emulator, including the processors, memories and interconnect, is built with off-the-shelf components. A flexible non-intrusive event logging mechanism is included at all levels of the memory hierarchy, making it possible to monitor the emulation in very fine detail. This paper presents the hardware design of RPM.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128085199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applications of Slack Neighborhood Graphs to Timing Driven Optimization Problems in FPGAs","authors":"Anmol Mathur, Kuang-Chien Chen, C. Liu","doi":"10.1145/201310.201329","DOIUrl":"https://doi.org/10.1145/201310.201329","url":null,"abstract":"In this paper we examine three different problems related to FPGA placement: timing driven placement of a technology mapped circuit, timing driven reconfiguration for yield enhancement and fault tolerance in FPGAs and timing driven design re-engineering for FPGAs. We show that timing driven relocation which transforms an infeasible placement into a feasible one is a key problem the solution of which will lead to good algorithms for all three of these optimization problems. We introduce the concept of a slack neighborhood graph (SNG) as a general tool for timing driven relocation of modules in an infeasible placement with a bounded increase in critical path delay. The slack neighborhood graph approach provides a unified approach to the solution of the three timing driven optimization problems of interest in this paper.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132202303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Field-Programmable Mixed-Analog-Digital Array","authors":"P. Chow, P. Gulak","doi":"10.1145/201310.201327","DOIUrl":"https://doi.org/10.1145/201310.201327","url":null,"abstract":"A novel field-programmable mixed-analog-digital array (FPMA) is proposed, which contains a field-programmable analog array, a field-programmable digital array, and a mixed-signal interface. This device is intended to be used for the rapid implementation of mixed-signal circuits. The resource and architectural requirements for this array are determined by analyzing a set of sample circuits. The mixed-signal interface is constructed from converter blocks that contain configurable A/D and D/A converters, which gives some flexibility in the specification of the interface. A 1.2 mm CMOS prototype IC has been designed to demonstrate the feasibility of FPMA technology.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124555445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Techniques for FPGA Implementation of Video Compression Systems","authors":"B. Schoner, J. Villasenor, S. Molloy, R. Jain","doi":"10.1145/201310.201334","DOIUrl":"https://doi.org/10.1145/201310.201334","url":null,"abstract":"Real-time video compression is a challenging subject for FPGA implementation because it typically has a large computational complexity and requires high data throughput. Previous implementations have used parallel banks of FPGAs or DSPs to meet these requirements. Using design techniques that maximize FPGA utilization, we have implemented two video compression systems, each of which uses a single FPGA. In the first system, algorithmic optimizations are made to create a low-complexity implementation that exploits the in-system programmability of the FPGA. This low-complexity implementation performs well, but is limited to a single compression algorithm. In the second system, the FPGA is augmented with an external, low-complexity, video signal processor (VSP.) This combination of ASIC and FPGA is flexible enough to implement four common compression algorithms, and powerful enough to execute them in real time.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122532798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TIERS: Topology IndependEnt Pipelined Routing and Scheduling for VirtualWire™ Compilation","authors":"C. Selvidge, A. Agarwal, M. Dahl, J. Babb","doi":"10.1145/201310.201314","DOIUrl":"https://doi.org/10.1145/201310.201314","url":null,"abstract":"TIERS is a new pipelined routing and scheduling algorithm implemented in a completeVirtualWire TM compilation and synthesis system. TIERS is described and compared to prior work both analytically and quantitatively. TIERS improves system speed by as much as a factor of 2.5 over prior work. TIERS routing results for both Altera and Xilinx based FPGA systems are provided.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128529054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping","authors":"J. Cong, Yean-Yow Hwang","doi":"10.1145/201310.201322","DOIUrl":"https://doi.org/10.1145/201310.201322","url":null,"abstract":"In this paper, we present an improvement of the FlwoMap algorithm, named CutMap, which combines depth and area minimization during the mapping process by computing min-cost min-height K-feasible cuts for critical nodes for depth minimization and computing min-cost K-feasible cuts for non-critical nodes for area minimization. CutMap guarantees depth-optimal mapping solutions in polynomial time as the FlowMap algorithm but uses considerably fewer K-LUTs. We have implemented CutMap and tested it on the MCNC logic synthesis benchmarks. For depth-optimal mapping solutions, CutMap uses 15% fewer K-LUTs than FlowMap. We also tested CutMap followed by the depth relaxation routines in FlowMap_r algorithm, which achieves area minimization by depth relaxation. CutMap followed FlowMap_r performs better than FlowMap_r.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"2003 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125769900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Level Bit-Serial Datapath Synthesis for Multi-FPGA Systems","authors":"T. Isshiki, W. Dai","doi":"10.1145/201310.201336","DOIUrl":"https://doi.org/10.1145/201310.201336","url":null,"abstract":"Field-programmable hardware exhibits a new trend towards computation-intensive applications. The basic idea is to completely customize the hardware architecture for the very given application in order to allocation the logic resources efficiently and effectively, improving the performance several orders of magnitude greater than general-purpose processor implementation. And at the same time, it still covers a wide variety of applications for their reconfigurability.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123139237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple FPGA Partitioning with Performance Optimization","authors":"Kalapi Roy-Neogi, C. Sechen","doi":"10.1145/201310.201333","DOIUrl":"https://doi.org/10.1145/201310.201333","url":null,"abstract":"We address the problem of partitioning a technology mapped FPGA circuit onto multiple FPGAs of a specific target technology. The physical characteristics of the multiple FPGA system (MFS) pose additional constraints to the circuit partitioning algorithms: the capacity of each FPGA, the timing constraints, the number of I/Os per FPGA, and the pre-designed interconnection patterns of the MFS. Existing partitioning techniques which minimize just the cut sizes of partitions fail to satisfy the above challenges. We therefore present a rectilinear partitioning algorithm which efficiently and accurately handles timing specifications. The signal path delays are estimated during partitioning using a timing model specific to a multiple FPGA architecture. The model combines all possible delay factors in a system with multiple FPGA chips of a target technology. A new dynamic net-weighting scheme was incorporated to minimize the number of pin-outs for each chip. Finally, we have developed a graph-based global router for pin assignment which can handle the pre-routed connections of our MFS structure. We successfully partitioned the MCNC Xilinx FPGA benchmarks producing 100% routable designs with high utilization levels in all cases. Using the performance optimization capabilities in our approach we have successfully partitioned these benchmarks satisfying the critical path constraints and achieving a significant reduction in the longest path delay. An average reduction of 17% in the longest path delay was achieved at the cost of 5% in total wire length. We have proved the effectiveness of our performance optimization technique by verifying the timing predictions of our partitioner with the actual delays obtained after placement and routing of a partitioned MFS. Partitioning results obtained with the Xilinx mapped MCNC benchmarks are encouraging.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134381169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Architectural \"Families\" to Increase FPGA Speed and Density","authors":"Vaughn Betz, Jonathan Rose","doi":"10.1145/201310.201312","DOIUrl":"https://doi.org/10.1145/201310.201312","url":null,"abstract":"In order to narrow the speed and density gap between FPGAs and MPGAs we propose the development of \"families\" of FPGAs. Each FPGA family is targeted at a single maximum logic capacity, and consists of several \"siblings\", or FPGAs of different yet complementary architectures. Any given application circuit is implemented in the sibling with the most appropriate architecture. With properly chosen siblings, one can develop a family of FPGAs which will have better speed and density than any single FPGA. We apply this concept to create two different FPGA families, one composed of architectures with different types of hard-wired logic blocks and the other created from architectures with different types of heterogeneous logic blocks. We found that a family composed of eight chips with different hard-wired logic block architectures simultaneously improves density by 12 to 14% and speed by 18 to 20% over the best single hard-wired FPGA.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"601 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133336699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HGA: A Hardware-Based Genetic Algorithm","authors":"S. Scott, A. Samal, S. Seth","doi":"10.1145/201310.201319","DOIUrl":"https://doi.org/10.1145/201310.201319","url":null,"abstract":"A genetic algorithm (GA) is a robust problem-solving method based on natural selection. Hardware's speed advantage and its ability to parallelize offer great rewards to genetic algorithms. Speedups of 1-3 orders of magnitude have been observed when frequently used software routines were implemented in hardware by way of reprogrammable field-programmable gate arrays (FPGAs). Reprogrammability is essential in a general-purpose GA engine because certain GA modules require changeability (e.g. the function to be optimized by the GA). Thus a hardware-based GA is both feasible and desirable. A fully functional hardware-based genetic algorithm (the HGA) is presented here as a proof-of-concept system. It was designed using VHDL to allow for easy scalability. It is designed to act as a coprocessor with the CPU of a PC. The user programs the FPGAs which implement the function to be optimized. Other GA parameters may also be specified by the user. Simulation results and performance analyses of the HGA are presented. A prototype HGA is described and compared to a similar GA implemented in software. In the simple tests, the prototype took about 6% as many clock cycles to run as the software-based GA. Further suggested improvements could realistically make the HGA 2-3 orders of magnitude faster than the software-based GA.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125530117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}