{"title":"Optimum wordlength allocation","authors":"G. Constantinides, P. Cheung, W. Luk","doi":"10.1109/FPGA.2002.1106676","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106676","url":null,"abstract":"This paper presents an approach to the wordlength allocation and optimization problem for linear digital signal processing systems implemented in Field-Programmable Gate Arrays. The proposed technique guarantees an optimum set of wordlengths for each internal variable, allowing the user to trade-off implementation area for error at system outputs. Optimality is guaranteed through modelling as a mixed integer linear program, constructed through novel techniques for the linearization of error and area constraints. Optimum results in this field are valuable since they can be used to assess the effectiveness of heuristic wordlength optimization techniques. It is demonstrated that one such previously published heuristic reaches within 0.7% of the optimum area over a range of benchmark problems.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122195364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Budiu, M. Mishra, Ashwin R. Bharambe, S. Goldstein
{"title":"Peer-to-peer hardware-software interfaces for reconfigurable fabrics","authors":"M. Budiu, M. Mishra, Ashwin R. Bharambe, S. Goldstein","doi":"10.1109/FPGA.2002.1106661","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106661","url":null,"abstract":"In this paper we describe a peer-to-peer interface between processor cores and reconfigurable fabrics. The main advantage of the peer-to-peer model is that it greatly expands the scope of application for reconfigurable computing and hence its potential benefits. The primary extension in our model is that \"code\" on the reconfigurable hardware unit is allowed to invoke routines both on the reconfigurable unit itself and on the fixed logic processor We describe the software constructs and compilation mechanisms needed for such an architecture, including a detailed description of the interface between the two parts of the application.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131651616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Sima, S. Cotofana, S. Vassiliadis, J. V. Eijndhoven, K. Vissers
{"title":"MPEG-compliant entropy decoding on FPGA-augmented TriMedia/CPU64","authors":"M. Sima, S. Cotofana, S. Vassiliadis, J. V. Eijndhoven, K. Vissers","doi":"10.1109/FPGA.2002.1106680","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106680","url":null,"abstract":"The paper presents a Design Space Exploration (DSE) experiment which has been carried out in order to determine the optimum FPGA-based Variable-Length Decoder (VLD) computing resource and its associated instructions, with respect to an entropy decoding task which is to be executed on the FPGA-augmented TriMedia/CPU64 processor We first outline the extension of the TriMedia/CPU64 architecture, which consists of an FPGA-based Reconfigurable Functional Unit (RFU) and the associated generic instructions. Then we address entropy decoding and propose a strategy to partially break the data dependency related to variable-length decoding. Three VLDs (VLD-1, VLD-2, VLD-3) instructions which can return 1, 2, or 3 symbols, respectively, are subsequently analyzed. After completing the DSE, we determined that VLD-2 instruction leads to the most efficient entropy decoding in terms of instruction cycles and FPGA area. The FPGA-based implementation of the computing resource associated to VLD-2 instruction is subsequently presented. When mapped on an ACEX EP1K100 FPGA from Altera, VLD-2 exhibits a latency of 8 TriMedia cycles, and uses all the Electronic Array Blocks and 51% of the logic cells of the device. The simulation results indicate that the VLD-2-based entropy decoder is 43% faster than its pure software counterpart.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114999001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconfigurable shape-adaptive template matching architectures","authors":"J. Gause, P. Cheung, W. Luk","doi":"10.1109/FPGA.2002.1106665","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106665","url":null,"abstract":"This paper presents reconfigurable computing strategies for a Shape-Adaptive Template Matching (SA-TM) method to retrieve arbitrarily shaped objects within images or video frames. A generic systolic array architecture is proposed as the basis for comparing three designs: a static design where the configuration does not change after compilation, a partially-dynamic design where a static circuit can be reconfigured to use different on-chip data, and a dynamic design which completely, adapts to a particular computation. While the logic resources required to implement the static and partially-dynamic designs are constant and depend only on the size of the search frame, the dynamic design is adapted to the size and shape of the template object, and hence requires much less area. The execution time of the matching process greatly depends on the number of frames the same object is matched at. For a small number of frames, the dynamic and partially-dynamic designs suffer from high reconfiguration overheads. This overhead is significantly reduced if the matching process is repeated on a large number of consecutive frames. We find that the dynamic SA-TM design in a 50 MHz Virtex 1000E device, including reconfiguration time, can perform almost 7,000 times faster than a 1.4 GHz Pentium 4 PC when processing a 100/spl times/100 template on 300 consecutive video frames in HDTV format.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114286712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeffrey J. Cook, Derek B. Gottlieb, Joshua D. Walstrom, Steve Ferrera, Chi-Wei Wang, N. Carter
{"title":"Mapping algorithms to the Amalgam programmable-reconfigurable processor","authors":"Jeffrey J. Cook, Derek B. Gottlieb, Joshua D. Walstrom, Steve Ferrera, Chi-Wei Wang, N. Carter","doi":"10.1109/FPGA.2002.1106697","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106697","url":null,"abstract":"The Amalgam programmable-reconfigurable processor is designed to provide the computational power required by upcoming embedded applications without requiring the design of application-specific hardware. It integrates multiple programmable processors and blocks of reconfigurable logic onto a single chip, using a clustered architecture, similar to the one used on the M-Machine to reduce wire length and delay and allow implementation at high clock rates. The clustered architecture provides tremendous flexibility, allowing applications to exploit parallelism at whatever granularity is best-suited to the application, while the combination of reconfigurable logic and programmable processors delivers much higher performance than could be achieved through programmable processors alone. This abstract presents the results of our initial experiments in hand-mapping applications onto Amalgam. Five applications (IDCT, Rijndael encryption, nQueens, DNA sequence comparison, and image dithering) have been implemented, achieving speedups ranging from 8.7/spl times/ to 23.2/spl times/ over the performance of a single programmable cluster by using the complete resources of an Amalgam chip.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133450492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PAM-Blox II: design and evaluation of C++ module generation for computing with FPGAs","authors":"O. Mencer","doi":"10.1109/FPGA.2002.1106662","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106662","url":null,"abstract":"This paper explores the implications of integrating flexible module generation into a compiler for FPGAs. The objective is to improve the programmability of FPGAs, or in other words, the productivity of the FPGA programmer. We describe (1) the module generation library PAM-Blox II, the second generation of object-oriented module generators in C++, targeted at computing with FPGAs, and (2) examples of design tradeoffs and performance results using redundant representations for addition and multiplication, and technology mapping of comparison and elementary function evaluation. PAM-Blox II is built on top of a set of extensions to the gate level FPGA design library PamDC to provide a more efficient, portable, scalable, and maintainable module generator library. Using PAM-Blox II we demonstrate a simplified interface to bit-level programability. The simplification results from the bottom-up approach and a close coupling of architecture generation, module generation and gate level CAD. The tradeoffs for the module generators are based on trading area for speed and hand-optimizing technology mapping to the specific FPGA technology. As an example, we show that redundant number representations hold one key to unleashing the full potential of reconfigurability on the bit-level. The presented module generators are applied to encryption and compression to show the impact of the bit-level optimizations on application performance.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126060847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Control and configuration software for a reconfigurable networking hardware platform","authors":"T. Sproull, J. Lockwood, David E. Taylor","doi":"10.1109/FPGA.2002.1106660","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106660","url":null,"abstract":"A suite of tools called NCHARGE (Networked Configurable Hardware Administrator for Reconfiguration and Governing via End-systems) has been developed to simplify the co-design of hardware and software components that process packets within a network of Field Programmable Gate Arrays (FPGAs). A key feature of NCHARGE is that it provides a high-performance packet interface to hardware and standard Application Programming Interface (API) between software and reprogrammable hardware modules. Using this API, multiple software processes can communicate to one or more hardware modules using standard TCP/IP sockets. NCHARGE also provides a Web-Based User Interface to simplify the configuration and control of an entire network switch that contains several software and hardware modules.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130213746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Böhm, J. Beveridge, B. Draper, C. Ross, M. Chawathe, W. Najjar
{"title":"Compiling ATR probing codes for execution on FPGA hardware","authors":"A. Böhm, J. Beveridge, B. Draper, C. Ross, M. Chawathe, W. Najjar","doi":"10.1109/FPGA.2002.1106693","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106693","url":null,"abstract":"This paper describes the implementation of an automatic target recognition (ATR) Probing algorithm on a reconfigurable system, using the SA-C programming language and optimizing compiler. The reconfigurable system is 800 times faster than a comparable Pentium running a C implementation of the same probing task. The reasons for this are analyzed.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128904311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Hutchings, R. Franklin, Daniel Carver, Gordon Brebner
{"title":"Assisting network intrusion detection with reconfigurable hardware","authors":"B. Hutchings, R. Franklin, Daniel Carver, Gordon Brebner","doi":"10.1109/FPGA.2002.1106666","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106666","url":null,"abstract":"String matching is used by Network Intrusion Detection Systems (NIDS) to inspect incoming packet payloads for hostile data. String-matching speed is often the main factor limiting NIDS performance. String-matching performance can be dramatically improved by using Field-Programmable Gate Arrays (FPGAs); accordingly, a \"regular-expression to FPGA circuit\" module generator has been developed. The module generator extracts strings from the Snort NIDS rule-set, generates a regular expression that matches all extracted strings, synthesizes a FPGA-based string matching circuit, and generates an EDIF netlist that can be processed by Xilinx software to create an FPGA bitstream. The feasibility of this approach is demonstrated by comparing the performance of the FPGA-based string matcher against the software-based GNU regex program. The FPGA-based string matcher exceeds the performance of the software-based system by 600x for large patterns.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124374273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image registration of real-time video data using the SONIC reconfigurable computer platform","authors":"W. Melis, P. Cheung, W. Luk","doi":"10.1109/FPGA.2002.1106656","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106656","url":null,"abstract":"This paper is concerned with the image registration problem as applied to video sequences that have been subjected to geometric distortions. This work involves the development of a computationally efficient algorithm to restore the video sequence using image registration techniques. An approach based on motion vectors is proposed and is found to be successful in restoring the video sequence for any affine transform based distortion. The algorithm is implemented in FPGA hardware targeted for a reconfigurable computing platform called SONIC It is shown that the algorithm can efficiently restore the video data in realtime.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131216626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}