2008 International Conference on Field-Programmable Technology最新文献

筛选
英文 中文
Dynamically programmable Reed Solomon processor with embedded Galois Field multiplier 动态可编程里德所罗门处理器与嵌入式伽罗瓦场乘法器
2008 International Conference on Field-Programmable Technology Pub Date : 2008-12-10 DOI: 10.1109/FPT.2008.4762395
A. El-Rayis, Xin Zhao, T. Arslan, A. Erdogan
{"title":"Dynamically programmable Reed Solomon processor with embedded Galois Field multiplier","authors":"A. El-Rayis, Xin Zhao, T. Arslan, A. Erdogan","doi":"10.1109/FPT.2008.4762395","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762395","url":null,"abstract":"This work presents a novel reconfigurable Galois field multiplier embedded in a dynamically reconfigurable processor for real time programmable Reed Solomon (RS) encoder and decoder targeting various communication standards. The fundamental operation in Reed-Solomon encoding and decoding is the multiplication over Galois field (GF). The reconfigurable GF multiplier with single instruction multiple data (SIMD) support is presented here, as an instruction set extension to the processor. The processor supports the RS coding to be programmable for Galois Field (28) with its sixteen primitive polynomials and for all supported data block sizes. Various optimization techniques have been applied in order to enhance the processor throughput. The throughput achieved for RS (204,188) is up to 202 Mbps for the encoder demonstrating a future proof flexible design.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"1117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122933352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Design and implementation of a high performance financial Monte-Carlo simulation engine on an FPGA supercomputer 基于FPGA超级计算机的高性能金融蒙特卡罗仿真引擎的设计与实现
2008 International Conference on Field-Programmable Technology Pub Date : 2008-12-10 DOI: 10.1109/FPT.2008.4762369
Xiang Tian, K. Benkrid
{"title":"Design and implementation of a high performance financial Monte-Carlo simulation engine on an FPGA supercomputer","authors":"Xiang Tian, K. Benkrid","doi":"10.1109/FPT.2008.4762369","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762369","url":null,"abstract":"Monte-Carlo simulation is a very widely used technique in scientific computations in general with huge computation benefits in solving problems where closed form solutions are impossible to derive. This technique is also characterized by a high degree of parallelism as a large number of different simulation paths need to be calculated, which makes it ideal for a parallel hardware implementation. This paper illustrates the benefits of such implementation in the context of financial computing as it implements a financial Monte-Carlo simulation engine on an FPGA-based supercomputer, called Maxwell, developed at the University of Edinburgh. The latter consists of a 32 CPU cluster augmented with 64 Virtex-4 Xilinx FPGAs connected in a 2D torus. Our engine can implement various Monte-Carlo simulations on the Maxwell machine with speed-ups in the 3-order magnitude compared to equivalent software implementations. This is illustrated in this paper in the context of an implementation of the Black-Scholes option pricing model. Real hardware implementation shows that our FPGA-based implementation of the Black-Scholes model outperforms an equivalent software implementation running on a workstation cluster with the same number of computing nodes (CPU/FPGA) by a factor of 750, which is the fastest ever reported FPGA implementation of this model.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130843397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
An FPGA-specific approach to floating-point accumulation and sum-of-products 浮点累加和积和的fpga专用方法
2008 International Conference on Field-Programmable Technology Pub Date : 2008-12-07 DOI: 10.1109/FPT.2008.4762363
F. D. Dinechin, B. Pasca, O. Creţ, R. Tudoran
{"title":"An FPGA-specific approach to floating-point accumulation and sum-of-products","authors":"F. D. Dinechin, B. Pasca, O. Creţ, R. Tudoran","doi":"10.1109/FPT.2008.4762363","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762363","url":null,"abstract":"This article studies two common situations where the flexibility of FPGAs allows one to design application-specific floating-point operators which are more efficient and more accurate than those offered by processors and GPUs. First, for applications involving the addition of a large number of floating-point values, an ad-hoc accumulator is proposed. By tailoring its parameters to the numerical requirements of the application, it can be made arbitrarily accurate, at an area cost comparable to that of a standard floating-point adder, and at a higher frequency. The second example is the sum-of-product operation, which is the building block of matrix computations. A novel architecture is proposed that feeds the previous accumulator out of a floating-point multiplier whose rounding logic has been removed, again improving the area/accuracy tradeoff. These architectures are implemented within the FloPoCo generator, freely available under the LGPL.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127789925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A run-length based connected component algorithm for FPGA implementation 一种基于运行长度的连接组件FPGA实现算法
2008 International Conference on Field-Programmable Technology Pub Date : 2008-12-07 DOI: 10.1109/FPT.2008.4762381
Kofi Appiah, A. Hunter, P. Dickinson, Jonathan Owens
{"title":"A run-length based connected component algorithm for FPGA implementation","authors":"Kofi Appiah, A. Hunter, P. Dickinson, Jonathan Owens","doi":"10.1109/FPT.2008.4762381","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762381","url":null,"abstract":"This paper introduces a real-time connected component labelling algorithm designed for field programmable gate array (FPGA) implementation. The algorithm run-length encodes the image, and performs connected component analysis on this representation. The run-length encoding, together with other parts of the algorithm, is performed in parallel; sequential operations are minimized as the number of runs are typically less than the number of pixels. The architecture is designed mainly on Block RAM (i.e. internal RAM) of the FPGA. A comparison with the multi-pass algorithm in hardware and software is presented to show the advantages of the algorithm. The algorithm runs comfortably in real-time with reasonably low resource utilization, making integration with other real-time algorithms feasible.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131527571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
An area-efficient FPGA realisation of a codebook-based image compression method 基于码本的图像压缩方法的面积高效FPGA实现
2008 International Conference on Field-Programmable Technology Pub Date : 2008-12-01 DOI: 10.1109/FPT.2008.4762415
P. Zipf, H. Hinkelmann, Hui Shao, R. Dogaru, M. Glesner
{"title":"An area-efficient FPGA realisation of a codebook-based image compression method","authors":"P. Zipf, H. Hinkelmann, Hui Shao, R. Dogaru, M. Glesner","doi":"10.1109/FPT.2008.4762415","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762415","url":null,"abstract":"We present a hardware implementation of an efficient image compression method optimised for small FPGAs. The compression method is based on a codebook of reference patterns to support multiplication-free quantisation of the image data. Based on specific features of a low-cost FPGA architecture, a pipelined implementation is developed and evaluated. The implemented hardware benefits from the simple structure of the compression method and is optimised for area and performance. The realised hardware as well as the underlying compression mechanism are described and the synthesis results for different model variants are compared. The results show that a high compression rate is possible at extremely low hardware costs. Also, a high frame rate can be obtained even on a low-cost FPGA.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131973056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A scalable reconfiguration mechanism for fast dynamic reconfiguration 一种可扩展的快速动态重构机制
2008 International Conference on Field-Programmable Technology Pub Date : 2008-12-01 DOI: 10.1109/FPT.2008.4762377
H. Hinkelmann, P. Zipf, M. Glesner
{"title":"A scalable reconfiguration mechanism for fast dynamic reconfiguration","authors":"H. Hinkelmann, P. Zipf, M. Glesner","doi":"10.1109/FPT.2008.4762377","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762377","url":null,"abstract":"Hardware reconfiguration during run-time provides attractive features like fast adaptivity, high hardware utilisation, and low area consumption due to efficient reuse of hardware components. In this paper, a novel multi-layered reconfiguration mechanism is proposed that allows frequent dynamic reconfiguration at very low latencies. It combines successful existing techniques such as multi-context and partial reconfiguration with new ideas like tag-matching and reconfiguration profiles to one uniform approach. As an important feature, the proposed reconfiguration mechanism is well scalable and can be adapted to given hardware structures easily, thus being applicable to virtually any reconfigurable fabric. In contrast to many existing techniques, it also supports even very heterogeneous architectures found for instance in custom reconfigurable systems. By experimental results, we show that our reconfiguration mechanism provides significantly lower reconfiguration latencies compared to some common existing techniques.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132218452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A profiler for a heterogeneous multi-core multi-FPGA system 异构多核多 FPGA 系统剖析器
2008 International Conference on Field-Programmable Technology Pub Date : 2008-12-01 DOI: 10.1109/FPT.2008.4762373
Daniel Nunes, Manuel Saldaña, P. Chow
{"title":"A profiler for a heterogeneous multi-core multi-FPGA system","authors":"Daniel Nunes, Manuel Saldaña, P. Chow","doi":"10.1109/FPT.2008.4762373","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762373","url":null,"abstract":"Understanding the behavior of an application is rarely a trivial task, due to the complexity of the system in which the application is executed, and the complexity of the application itself. The task becomes even more troublesome, if the application is being run in a parallel environment where relationships between each application execution are needed to grasp the necessary understanding of the application behavior. FPGA flexibility increases the complexity of such tasks by allowing not only changes to the application, to adapt to the hardware, but also to tailor the hardware for a specific application. To take full advantage of these systems, a tool that will help the user to understand an application is paramount. In this paper, we present a profiler for the TMD, a heterogeneous multicore multiFPGA system designed at the University of Toronto. The profiler can be configured for a specific application running on a specific hardware configuration. It allows retrieval of all communication calls and any user state defined by instrumentation of the source code. We test the profiler with two simple case studies: MPI Barrier, where we compare a sequential with a binary tree algorithm, and a heat equation solver that uses the Jacobi iterations method, where we compare blocking with non-blocking MPI calls.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"29 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131470521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Leakage power reduction for coarse grained dynamically reconfigurable processor arrays with fine grained Power Gating technique 采用细粒度功率门控技术降低粗粒度动态可重构处理器阵列的泄漏功率
2008 International Conference on Field-Programmable Technology Pub Date : 2008-12-01 DOI: 10.1109/FPT.2008.4762410
Yoshiki Saito, T. Shirai, Takuro Nakamura, T. Nishimura, Y. Hasegawa, S. Tsutsumi, Toshihiro Kashima, M. Nakata, S. Takeda, K. Usami, H. Amano
{"title":"Leakage power reduction for coarse grained dynamically reconfigurable processor arrays with fine grained Power Gating technique","authors":"Yoshiki Saito, T. Shirai, Takuro Nakamura, T. Nishimura, Y. Hasegawa, S. Tsutsumi, Toshihiro Kashima, M. Nakata, S. Takeda, K. Usami, H. Amano","doi":"10.1109/FPT.2008.4762410","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762410","url":null,"abstract":"One of the benefits of coarse grained dynamically reconfigurable processor array(DRPA) is its low dynamic power consumption by operating a number of processing elements(PE) in parallel with low clock frequency. However, in the future advanced processes, leakage power will occupy a considerable part of the total power consumption, and it may degrade the advantage of DRPAs. In order to reduce the leakage power, a fine grained Power Gating(PG) is applied to a DRPA, MuCCRA-2.32b, and leakage power and area overhead are measured. We evaluated the effect of two control modes; Pair and Unit Individual based on layout design and real applications. It appears that by applying PG for ALUs and SMUs in PEs individually, 48% of leakage power can be reduced with 9.0% of area overhead.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117310158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Optimised single pass connected components analysis 优化的单道连接组件分析
2008 International Conference on Field-Programmable Technology Pub Date : 2008-12-01 DOI: 10.1109/FPT.2008.4762382
Ni Ma, D. Bailey, C. T. Johnston
{"title":"Optimised single pass connected components analysis","authors":"Ni Ma, D. Bailey, C. T. Johnston","doi":"10.1109/FPT.2008.4762382","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762382","url":null,"abstract":"Classical connected components labelling algorithms are unsuitable for real-time processing of streamed images on an FPGA because they require two passes through the image. Recently, a single-pass algorithm was proposed that avoided the need to buffer an intermediate image. In this paper, a new single pass algorithm is described that is a considerable improvement over the existing algorithms. The new algorithm reassigns and reuses labels each row to minimise the size of both the equivalence and region data tables. The optimised single-pass algorithm reduces the worst case memory requirement by over 100 times that of the original algorithm (for measuring region area), and reduces the latency to only 1 row.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132753293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
Automatic generation of decomposition based matrix inversion architectures 基于分解的矩阵反演体系结构的自动生成
2008 International Conference on Field-Programmable Technology Pub Date : 2008-12-01 DOI: 10.1109/FPT.2008.4762421
A. Irturk, Bridget Benson, A. Arfaee, R. Kastner
{"title":"Automatic generation of decomposition based matrix inversion architectures","authors":"A. Irturk, Bridget Benson, A. Arfaee, R. Kastner","doi":"10.1109/FPT.2008.4762421","DOIUrl":"https://doi.org/10.1109/FPT.2008.4762421","url":null,"abstract":"Matrix inversion is an essential computation for various algorithms which are employed in multi-antenna wireless communication systems. FPGAs are ideal platforms for wireless communication; however, the need for vast amounts of customization throughout the design process of a matrix inversion core can overwhelm the designer. Decomposition methods provide the analytic simplicity and computational convenience necessary for computationally intensive matrix inversion. This paper presents automatic generation of different decomposition based matrix inversion architectures using a matrix inversion core generator tool, GUSTO with different parameterization options. We present automatic generation of a variety of general purpose matrix inversion architectures which have comparable results to published matrix inversion architecture implementations, but offer the advantage of providing the designer the ability to study the tradeoffs between architectures with different design parameters.","PeriodicalId":320925,"journal":{"name":"2008 International Conference on Field-Programmable Technology","volume":"30 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115931954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信