{"title":"Implementing a simple continuous speech recognition system on an FPGA","authors":"S. Melnikoff, S. Quigley, M. J. Russell","doi":"10.1109/FPGA.2002.1106682","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106682","url":null,"abstract":"Speech recognition is a computationally demanding task, particularly the stage which uses Viterbi decoding for converting pre-processed speech data into words or sub-word units. We present an FPGA implementations of the decoder based on continuous hidden Markov models (HMMs) representing monophones, and demonstrate that it can process speech 75 times real time, using 45% of the slices of a Xilinx Virtex XCV1000.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121827041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The effects of datapath placement and C-slow retiming on three computational benchmarks","authors":"N. Weaver, J. Wawrzynek","doi":"10.1109/FPGA.2002.1106694","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106694","url":null,"abstract":"Summary form only given. Two important optimizations within the FPGA design process, C-slow retiming and datapath placement, offer significant benefits for designers. Many have advocated and implemented tools to use these techniques in both automatic and semiautomatic manner but they have not made their way into conventional FPGA toolflows. C-slow retiming is a method of accelerating computations that include feedback loops. Instead of having a single instance of the computation, the feedback loop is pipelined so that C separate instances are all calculated simultaneously. This allows fine grained pipelining to occur even in designs that include feedback loops, such as single round cryptographic implementations or microprocessors. Done properly, it imposes a significant but not imposing latency penalty for single computations while offering huge increases in throughput. Datapath placement is simply constructing the design in a manner that accounts for the higher level data flows. This offers several benefits, including improved performance, more physically compact designs, shorter wires, and faster place and route times when the FPGA is heavily utilized. Even for designs with less structure which are amenable to simulated annealing, datapath placement may still offer a significant benefit. To clearly demonstrate the importance of these optimizations we have hand-modified three computational benchmarks which represent significant themes within FPGA computation: Rijndael/AES encryption, Smith/Waterman, and a simplified 32-bit microprocessor datapath. All three represent significantly different modes of computation within FPGAs, but all gain significantly from the use of these techniques.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122889451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan E. Scalera, Creed F. Jones, M. Soni, Mark B. Bucciero, P. Athanas, A. L. Abbott, Amitabh Mishra
{"title":"Reconfigurable object detection in FLIR image sequences","authors":"Jonathan E. Scalera, Creed F. Jones, M. Soni, Mark B. Bucciero, P. Athanas, A. L. Abbott, Amitabh Mishra","doi":"10.1109/FPGA.2002.1106686","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106686","url":null,"abstract":"Future surveillance operations will require easily deployable \"microsensors\" that are capable of autonomous detection and identification of objects. These devices will operate under severe limitations on energy consumption, to enable battery-powered operation. They will assess sensor inputs locally, transmitting data only after objects of interest have been detected and extracted from sensor data. This paper describes a prototype system that detects and tracks moving objects in image sequences obtained from an infrared video camera. Computation in the system is distributed across an FPGA and a DSP chip. The current system analyzes input images, in search of objects that meet predefined criteria. If these criteria are met, the system extracts a sub-image (or \"chip\") that contains an object of interest, and then transmits that to a central site for manual review and further analysis.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"101-B 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125750365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Bellows, J. Flidr, T. Lehman, Brian Schott, K. Underwood
{"title":"GRIP: a reconfigurable architecture for host-based gigabit-rate packet processing","authors":"P. Bellows, J. Flidr, T. Lehman, Brian Schott, K. Underwood","doi":"10.1109/FPGA.2002.1106667","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106667","url":null,"abstract":"One of the fundamental challenges for modern high-performance network interfaces is the processing capabilities required to process packets at high speeds. Simply transmitting or receiving data at gigabit speeds fully utilizes the CPU on a standard workstation. Any processing that must be done to the data, whether at the application layer or the network layer, decreases the achievable throughput. This paper presents an architecture for offloading a significant portion of the network, processing from the host CPU onto the network interface. A prototype, called the GRIP (Gigabit Rate IPSec) card, has been constructed based on an FPGA coupled with a commodity Gigabit Ethernet MAC. Experimental results based on the prototype are presented and analyzed. In addition, a second generation design is presented in the context of lessons learned from the prototype.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130238530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast area estimation to support compiler optimizations in FPGA-based reconfigurable systems","authors":"D. Kulkarni, W. Najjar, R. Rinker, F. Kurdahi","doi":"10.1109/FPGA.2002.1106678","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106678","url":null,"abstract":"Several projects have developed compiler tools that translate high-level languages down to hardware description languages for mapping onto FPGA-based reconfigurable computers. These compiler tools can apply extensive transformations that exploit the parallelism inherent in the computations. However, the transformations can have a major impact on the chip area (number of logic blocks) used on the FPGA. It is imperative therefore that the compiler user be provided with feedback indicating how much space is being used. In this paper we present a fast compile-time area estimation technique to guide the compiler optimizations. Experimental results show that our technique achieves an accuracy within 2.5% for small image-processing operators, and within 5.0% for larger benchmarks, as compared to the usual post-compilation synthesis tool estimations. The estimation time is in the order of milliseconds as compared to several minutes for a synthesis tool.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132386525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precis: a design-time precision analysis tool","authors":"Mark L. Chang, S. Hauck","doi":"10.1109/FPGA.2002.1106677","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106677","url":null,"abstract":"Currently, few tools exist to aid the FPGA developer in translating an algorithm designed for a general-purpose-processor into one that is precision-optimized for FPGAs. This task requires extensive knowledge of both the algorithm and the target hardware. We present a design-time tool, Precis, which assists the developer in analyzing the precision requirements of algorithms specified in MATLAB. Through the combined use of simulation, user input, and program analysis, we demonstrate a methodology for precision analysis that can aid the developer in focusing their manual precision optimization efforts.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116947713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Customising floating-point designs","authors":"A. A. Gaffar, W. Luk, P. Cheung, N. Shirazi","doi":"10.1109/FPGA.2002.1106698","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106698","url":null,"abstract":"This paper describes a method for customising the representation of floating-point numbers that exploits the flexibility of reconfigurable hardware. The method determines the appropriate size of mantissa and exponent for each operation in a design, so that a cost function with a given error specification for the output relative to a reference representation can be satisfied. Currently our tool, which adopts an iterative implementation of this method, supports single- or double-precision floating-point representation as the reference representation. It produces customised floating-point formats with arbitrary-sized mantissa and exponent. Results show that, for calculations involving large dynamic ranges, our method can achieve significant hardware reduction and speed improvement with respect to a design adopting the reference representation.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126943371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tabu search with intensification strategy for functional partitioning in hardware-software codesign","authors":"T. Wiangtong, P. Cheung, W. Luk","doi":"10.1109/FPGA.2002.1106691","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106691","url":null,"abstract":"This paper presents tabu search (TS) method with intensification strategy for hardware-software partitioning. The algorithm operates on functional blocks for designs represented as directed acyclic graphs (DAG), with the objective of minimising processing time under various hardware area constraints. Results are compared to two other heuristic search algorithms: genetic algorithm (GA) and simulated annealing (SA). The comparison involves a scheduling model based on list scheduling for calculating processing time used as a system cost, assuming that shared resource conflicts do not occur. The results show that TS, which rarely appears for solving this kind of problem, is superior to SA and GA in terms of both search time and the quality of solutions. In addition, we have implemented intensification strategy in TS called penalty reward, which can further improve the quality of results.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128699516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware-assisted fast routing","authors":"A. DeHon, Randy Huang, J. Wawrzynek","doi":"10.1109/FPGA.2002.1106675","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106675","url":null,"abstract":"To fully realize the benefits of partial and rapid reconfiguration of field-programmable devices, we often need to dynamically schedule computing tasks and generate instance-specific configurations-new graphs which must be routed during program execution. Consequently, route time can be a significant overhead cost reducing the achievable net benefits of dynamic configuration generation. BY adding hardware to accelerate routing, we show that it is possible to compute routes in one thousandth the time of a traditional, software router and achieve routes that are within 5% of the state-of-the-art offline routing algorithms for a sample set of application netlists and within 25% for a set of difficult synthetic benchmarks. We further outline how strategic use of parallelism can allow the total route time to scale substantially less than linearly in graph size. We detail the source of the benefits in our approach and survey a range of options for hardware assistance that van, from a speedup of over 10/spl times/ with modest hardware overhead to speedups in excess of 1000/spl times/.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114344464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and analysis of a layer seven network processor accelerator using reconfigurable logic","authors":"G. Memik, S. Memik, W. Mangione-Smith","doi":"10.1109/FPGA.2002.1106668","DOIUrl":"https://doi.org/10.1109/FPGA.2002.1106668","url":null,"abstract":"In this paper, we present an accelerator that is designed to improve performance of network processing applications, particularly layer seven networking applications. The accelerator can easily be integrated in Network Processors. We present the design details of two different FPGA implementations: a design where each task is implemented in the accelerator and another one where the accelerator must be partially reconfigured for different tasks. We also present novel algorithms for important tasks such as tree lookup and pattern matching that utilize the accelerator. We show that the accelerator improves the overall execution time by as much as 20-times for these tasks. We show that the accelerator can improve the execution time of a representative layer seven application by an order of magnitude. Finally, we discuss the effects of reconfiguration time and frequency over the performance of the accelerator.","PeriodicalId":272235,"journal":{"name":"Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123771728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}