{"title":"Reconfigurable processor architectures exploiting high bandwidth optical channels","authors":"M. Sakr, S. Levitan, C. Lee Giles, D. Chiarulli","doi":"10.1109/FPGA.1998.707914","DOIUrl":"https://doi.org/10.1109/FPGA.1998.707914","url":null,"abstract":"There is growing interest in studying the possibility of reconfigurable architectures as replacements for general purpose computing for certain application domains. Reconfigurable systems can take advantage of deep computational pipelines, perform concurrent execution and are inherently data flow in nature. Furthermore, these systems have the capability of 'on the fly' reconfiguration of all or portions of the hardware to represent all the functionality required to complete the execution of an application. However, these architectures suffer from slow run time reconfiguration (RTR) due to the fact that the configuration memory resides off-chip and hence requires high access latency. This disadvantage limits the system performance and the application domain in which reconfigurable systems could prove effective. To overcome slow RTR, recent approaches include on-chip configuration memory to cache the next possible configurations. This approach trades off die area for fast RTR which diminishes the processing power of the reconfigurable processor. The high cost of adding configuration cache, up to 50% of the die area, would considerably increase the number of hardware reconfigurations required compared to architectures without on-chip cache. This paper presents an alternative reconfigurable architecture which overcomes these limitations by exploiting high bandwidth optical channels. We develop a performance model to analyze and compare the performance of cache based RTR architectures, optical based RTR architectures and hybrid optical-cache based RTR architectures.","PeriodicalId":309841,"journal":{"name":"Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124117240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Sezer, Roger Francis Woods, J. Heron, A. Marshall
{"title":"Fast partial reconfiguration for FCCMs","authors":"S. Sezer, Roger Francis Woods, J. Heron, A. Marshall","doi":"10.1109/FPGA.1998.707934","DOIUrl":"https://doi.org/10.1109/FPGA.1998.707934","url":null,"abstract":"The emergence of new FPGA families such as the Xilinx 6200 FPGA family and the Atmel 40000 series has been an important development in the FPGAs for Custom Computing Machines (FCCMs). These devices have number of appealing features when compared to other technologies such as the Xilinx 4000 series SRAM technology. These can be characterised as follows: faster reconfiguration (typically m/spl mu/ s or /spl mu/s), support for partial reconfiguration, dedicated microprocessor interface. An approach for run-time reconfiguration can be achieved by considering a range of functions collectively and developing the specific circuit architectures for each so that a high degree of commonality exists between them in terms of their structure, wiring and cell function. This is done by representing the functions or algorithms using Signal Flow Graphs (SFGs) and manipulating them to produce similar graphs for different functions. This basic concept can only be exploited through the development of an efficient hardware system. This revolves around the concept of virtual hardware which is integrated within the operating system and is supported by programming languages such as C and C++. The reconfigurable designs which allow partial re-configuration, are stored within a configuration data graph. Whilst this allows the configuration data to be efficiently stored, reconfiguration state graphs are used for high speed reconfiguration. The entire software hardware system for fast partial reconfiguration is illustrated.","PeriodicalId":309841,"journal":{"name":"Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130035618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A stream-based configurable computing radio testbed","authors":"Steven F. Swanchara, S. Harper, P. Athanas","doi":"10.1109/FPGA.1998.707879","DOIUrl":"https://doi.org/10.1109/FPGA.1998.707879","url":null,"abstract":"Software radios have emerged as important tools in the development of new signal processing algorithms, networking protocols, and propagation experiments in wireless environments. With a software radio, the signal processing and modem management can be changed in a timely matter. This enables users to explore a variety of domains before committing a design to silicon. Contemporary DSP-based software radios have proven indispensable in the wireless research community. The capabilities of these platforms, however, are somewhat limited due to the confined computational capacity of current DSPs. This paper presents a stream-based CCM soft radio consisting of a variety of FPGAs, memories, and a programmable radio interface, providing a pathway from antenna to wired network. CCM techniques are well suited for the computations associated with wireless modem management, and provide a substantial computational margin over contemporary alternatives, allowing live experiments using advanced digital signal processing.","PeriodicalId":309841,"journal":{"name":"Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129293554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A quantitative analysis of reconfigurable coprocessors for multimedia applications","authors":"T. Miyamori, K. Olukotun","doi":"10.1109/FPGA.1998.707876","DOIUrl":"https://doi.org/10.1109/FPGA.1998.707876","url":null,"abstract":"Recently, computer architectures that combine a reconfigurable (or retargetable) coprocessor with a general-purpose microprocessor have been proposed. These architectures are designed to exploit large amounts of fine grain parallelism in applications. In this paper, we study the performance of the reconfigurable coprocessors on multimedia applications. We compare a Field Programmable Gate Array (FPGA) based reconfigurable coprocessor with the array processor called REMARC (Reconfigurable Multimedia Array Coprocessor). REMARC uses a 16-bit simple processor that is much larger than a Configurable Logic Block (CLB) of an FPGA. We have developed a simulator, a programming environment, and multimedia application programs to evaluate the performance of the two coprocessor architectures. The simulation results show that REMARC achieves speedups ranging from a factor of 2.3 to 7.3 on these applications. The FPGA coprocessor achieves similar performance improvements. However, the FPGA coprocessor needs more hardware area to achieve the same performance improvement as REMARC.","PeriodicalId":309841,"journal":{"name":"Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127537855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mapping homogeneous computations onto dynamically configurable coarse-grained architectures","authors":"Andreas Dandalis, V. Prasanna","doi":"10.1109/FPGA.1998.707933","DOIUrl":"https://doi.org/10.1109/FPGA.1998.707933","url":null,"abstract":"FPGAs are fine-grained architectures, mainly designed for implementing bit-level tasks and random logic functions. Their performance is limited for computationally demanding applications over large word length data. A highly promising avenue that is being explored by many research groups is coarse-grained configurable architectures. These architectures are datapath-oriented structures and consist of a small number of powerful, word-based configurable processing elements (PEs). Such architectures can result in greater computational efficiency and high throughput for coarse-grained computing tasks. The key for achieving high performance solutions is efficient mapping of tasks onto above architectures. In addition to achieving high computational rates, partitionability is a desirable characteristic of the mapping. Moreover, the computational efficiency must scale with the size of the architecture. Finally, it must result in a simple PE structure, regular/balanced dataflow and sustainable I/O requirements so that it can be realized in hardware. In this paper we show a methodology for deriving dynamic computation structures for 2 dimensioned homogeneous computations. Homogeneous computations lead to all PEs having the same functionality. The derived dynamic structures match the datapath-oriented nature of coarse-grained architectures and lead to efficient mapping schemes. Our solutions require constant I/O and smaller amount of local memory/PE compared with known solutions.","PeriodicalId":309841,"journal":{"name":"Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133588510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-sorting radix-2 FFT on FPGAs using parallel pipelined distributed arithmetic blocks","authors":"Manoucher Shaditalab, G. Bois, M. Sawan","doi":"10.1109/FPGA.1998.707943","DOIUrl":"https://doi.org/10.1109/FPGA.1998.707943","url":null,"abstract":"Design and implementation of parallel pipelined Fast Fourier Transform (FFT), using Decimation in Frequency (DIF) algorithm on FPGAs is presented. The FFT core for 1024 complex data point is implemented on the X-CIM which is a Re-configurable Acceleration Subsystem (RAS) with a TMS320C4x DSP-processor and two XC4013 FPGA as its processing units. The proposed FFT machine is an alternative to the bit serial-parallel FFT algorithm using Distributed Arithmetic Look Up Table (DALUT) method. The advantage of proposed design is mainly in its cost effective and hardware-efficient parallel implementations of the N-point DFT, offering highly attractive throughput rates in relation to the conventional DSP processors. Moreover, the processor's data-path structure is independent of sampled data-paints, and it has a self-sorting property where the output is in properly ordered form. Our goal is to improve size-performance requirements of an FFT core function using modular and hierarchical VHDL description combined with IP-core library elements from Xilinx.","PeriodicalId":309841,"journal":{"name":"Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130406229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating Adobe Photoshop with reconfigurable logic","authors":"Satnam Singh, Robert Slous","doi":"10.1109/FPGA.1998.707901","DOIUrl":"https://doi.org/10.1109/FPGA.1998.707901","url":null,"abstract":"This paper presents the results of a project designed to produce a commercial application for reconfigurable logic. We describe how we took the popular image processing application Adobe Photoshop and used its plug-in technology to devise a set of FPGA-based filters to accelerate colour space conversion and image convolution operations. Some of the barriers that make it difficult to produce portable FPGA-based filters are explored.","PeriodicalId":309841,"journal":{"name":"Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133407714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterization and parameterization of a pipeline reconfigurable FPGA","authors":"M. Moe, H. Schmit, S. Goldstein","doi":"10.1109/FPGA.1998.707923","DOIUrl":"https://doi.org/10.1109/FPGA.1998.707923","url":null,"abstract":"The article defines a class of architectures for pipeline reconfigurable FPGAs by parameterizing a generic model. This class of architecture is sufficiently general to allow exploration of the most important design trade-offs. The parameters include the word size and LUT size, the number of global busses and registers associated with each logic block, and the horizontal interconnect within each stripe. We have developed an area model for the architecture that allows us to quickly estimate the area of an instance of the architectural class as a function of the parameter values. We compare the estimates generated by this model to one instance of the architecture that we have designed and fabricated.","PeriodicalId":309841,"journal":{"name":"Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251)","volume":"2022 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115612303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DES key breaking, encryption and decryption on the XC6216","authors":"T. Kean, Ann Duncan","doi":"10.1109/FPGA.1998.707930","DOIUrl":"https://doi.org/10.1109/FPGA.1998.707930","url":null,"abstract":"Given the security issues arising in the electronic transfer of data, hardware acceleration of cryptographic functions is crucial for large amounts of data. Encryption, decryption and keybreaking of the Data Encryption Standard, DES, are discussed. Prototype designs were realized and tested on the XC6200DS PCI Development System.","PeriodicalId":309841,"journal":{"name":"Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251)","volume":"10 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126100063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An architecture simulator for National Semiconductor's adaptive processing architecture (NAPA)","authors":"J. Arnold","doi":"10.1109/FPGA.1998.707912","DOIUrl":"https://doi.org/10.1109/FPGA.1998.707912","url":null,"abstract":"Early simulation is a very important tool in the development any large scale system. Accuracy and flexibility are critical characteristics which allow the architect to explore the design tradeoff space. Moreover, in many systems, especially those for reconfigurable computing, a good simulation environment will continue to be used long after the architecture solidifies, serving a variety of roles including as a platform for the development of run time systems, programming tools, benchmarks, and even end applications. Therefore, visibility, controllability and user interface are also important design considerations. National Semiconductor's Adaptive Processing Architecture (NAPA) integrates a Fixed Instruction set Processor (FIP), an Adaptive Logic Processor (ALP), memory and other support circuitry into a single reconfigurable computing device. The NAPA architecture simulator, NAPAsim, consists of a C language, cycle accurate model of the RISC core, peripherals and memories, coupled with an event driven logic simulator for modelling the user-defined contents of the reconfigurable logic and a Tcl/Tk based GUI to provide source level symbolic debugging capabilities. NAPAsim was developed to serve as both a tool for architectural exploration and as a platform for system and application software development.","PeriodicalId":309841,"journal":{"name":"Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131384690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}