2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)最新文献_第5页

Area-efficient dynamically reconfigurable protocol-processing-hardware for access network communications SoC 面向接入网通信SoC的区域高效动态可重构协议处理硬件

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032501

Saki Hatta, N. Tanaka, S. Shigematsu

引用次数: 1

A high-level analysis of a multi-core vision processor using SystemC and TLM2.0 基于SystemC和TLM2.0的多核视觉处理器的高级分析

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032491

J. Y. Mori, M. Hübner

{"title":"A high-level analysis of a multi-core vision processor using SystemC and TLM2.0","authors":"J. Y. Mori, M. Hübner","doi":"10.1109/ReConFig.2014.7032491","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032491","url":null,"abstract":"Vision Processors are integrated circuits with the aim to put together sensors and processing elements at the same chip. There are several constraints a designer may take into account when developing a vision processor: available technology, power consumption, thermal management, fault tolerance, speed, silicon area and application-specific needs. Most of these vision processors are based on analog circuits and can perform only low-level processing, like filtering and contrast adjustment. Digital processing elements can allow for more programmability in such systems, however, the approaches found in the literature do not explore the integration of sensor and processing elements in an efficient way. In addition, it is envisioned that vision processors can take advantage of the recent Multi/Many-Core advances. In this work, a full integration is analyzed, exploring the spatial distribution of sensors and processors. All the design blocks were developed using SystemC language with TLM2.0 standard, in order to allow for a better ESL analysis. The exploration of pure LT and mixed LT/AT models is used for extract information about parallelism in data transfer and operations. An application with some well-known algorithms is analyzed for a variable number of cores, in order to validate the tool-set and the methodology used.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127421389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Net reordering and multicommodity flow based global routing for FPGAs 基于多商品流的fpga全局路由

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032540

Cristinel Ababei, R. Kavasseri, M. Zare

{"title":"Net reordering and multicommodity flow based global routing for FPGAs","authors":"Cristinel Ababei, R. Kavasseri, M. Zare","doi":"10.1109/ReConFig.2014.7032540","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032540","url":null,"abstract":"The most popular algorithm for solving the routing problem for field programmable gate arrays (FPGAs) has virtually remained the same for the past two decades. It is essentially an iterative maze technique, such as Dijkstra's algorithm, applied to each net in the circuit repeatedly. During multiple routing iterations, nets are ripped-up and rerouted via different paths to resolve competition for routing resources or to improve circuit delay. The most popular implementation of such a routing approach is the PathFinder algorithm used inside the VPR tool [1]. The quality of the routing solution depends however on the order in which nets are processed during each of the routing iterations. This is commonly referred to as the net ordering problem. PathFinder addresses this problem through continuous updates of the cost associated with overusing routing resources. After each routing iteration, the cost of overusing a routing resource is increased based on the routing so far, so that probability of resolving all congestion during future iterations increases. To further address the net ordering problem, in this paper, we investigate the effectiveness of two combined techniques to enhance PathFinder. We change the order in which nets are ripped-up and rerouted to give higher priority to nets with two, three, and more than eleven pins because these nets have the largest impact on the quality of the routing solution. Also, we alter the cost calculation during wave expansions for two-pin nets based on the global routing solution obtained by solving an equivalent multicommodity flow problem. Preliminary results suggest that the conventional FPGA routing solutions can still be improved.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117237362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3D-LeukoNoC: A dynamic NoC protection 3D-LeukoNoC:一种动态NoC保护

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032485

Martha Johanna Sepúlveda, G. Gogniat, Daniel Flórez, J. Diguet, C. Pedraza, M. Strum

引用次数: 5

Place Reservation technique for online task placement on a multi-context heterogeneous reconfigurable architecture

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032553

Quang-Hoa Le, E. Casseau, A. Courtay

引用次数: 3

Characterization of OpenCL on a scalable FPGA architecture OpenCL在可扩展FPGA架构上的表征

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032505

Shanyuan Gao, Jeremy Chritz

{"title":"Characterization of OpenCL on a scalable FPGA architecture","authors":"Shanyuan Gao, Jeremy Chritz","doi":"10.1109/ReConFig.2014.7032505","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032505","url":null,"abstract":"The recent release of Altera's SDK for OpenCL has greatly eased the development of FPGA-based systems. Research have shown performance improvements brought by OpenCL using a single FPGA device. However, to meet the objectives of high performance computing, OpenCL needs to be evaluated using multiple FPGAs. This work has proposed a scalable FPGA architecture for high performance computing. The design includes multiple FPGA modules and a high performance backplane. The modular nature of this architecture supports the combination of different FPGAs, as well as provides for easy hardware updates. FPGA modules based on Stratix V are compatible with Altera's OpenCL tool flow. The evaluation has tested the native IO performance of the architecture and the results have demonstrated scalability using six FPGAs. The host-to-device peak bandwidth is measured as 13.1 GB/s for read operation and 12.1 GB/s for write operation. The FPGA-to-memory bandwidth is measured as 64.5 GB/s in total. An OpenCL AES kernel is selected to test the scalable multi-FPGA architecture. The test results have shown peak throughput is achiveded when six FPGAs are used. The throughput per watt shows 5× improvement using four FPGAs, over a general-purpose processor.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122143017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Mission control: A performance metric and analysis of control logic for pipelined architectures on FPGAs 任务控制:fpga上流水线架构的性能度量和控制逻辑分析

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032539

Sam Skalicky, S. López, M. Lukowiak, Christopher A. Wood

引用次数: 3

A power-efficient real-time architecture for SURF feature extraction 一种用于SURF特征提取的节能实时架构

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032492

C. Wilson, P. Zicari, S. Craciun, P. Gauvin, E. Carlisle, A. George, H. Lam

{"title":"A power-efficient real-time architecture for SURF feature extraction","authors":"C. Wilson, P. Zicari, S. Craciun, P. Gauvin, E. Carlisle, A. George, H. Lam","doi":"10.1109/ReConFig.2014.7032492","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032492","url":null,"abstract":"This paper presents a novel FPGA-based architecture for the Speeded-Up Robust Feature (SURF) extractor. By leveraging the inherent parallelism of the SURF algorithm, we designed a fully pipelined architecture implemented on the FPGA fabric of a Xilinx Zynq-7020 device (XC7Z020CLG484-1). Compared with other high-performing SURF designs in the literature, our implementation achieved the highest frame rate (131.36 fps) while compactly fitting on a single device and consuming only 0.608 Watts of average power. An experimental platform featuring a 640×480 resolution camera was used to compare the proposed design with OpenSURF, a widely used open-source C++ library, running on a high-end Intel i7 processor. Our system achieved real-time performance independent of the number of interest points extracted from the targeted image, and consistently outperformed the SURF software baseline, reaching a maximum speedup of 15. An extensive analysis was conducted to prove that the performance of our proposed architecture was as robust as the SURF algorithm to image transformations (rotation and scaling) and image distortions (blurring and pixelation), demonstrating that interest-point repeatability was maintained under varying viewing conditions.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134000937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Enabling FPGA support in Matlab based heterogeneous systems 在基于Matlab的异构系统中启用FPGA支持

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032515

Sam Skalicky, Tyler Kwolek, S. López, M. Lukowiak

{"title":"Enabling FPGA support in Matlab based heterogeneous systems","authors":"Sam Skalicky, Tyler Kwolek, S. López, M. Lukowiak","doi":"10.1109/ReConFig.2014.7032515","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032515","url":null,"abstract":"FPGAs have been shown to provide orders of magnitude improvement over CPUs and GPUs in terms of absolute performance and energy efficiency for various kernels such as Cholesky decomposition, matrix inversion, and FFT among others. Despite this, the overall performance of many applications suffer when implemented entirely in FPGAs. Combining FPGAs with CPUs and GPUs provides the range of capabilities needed to support diverse computational requirements of applications. Integrating FPGAs into these systems challenges application developers with constructing hardware kernel implementations and interfacing from the low level hardware logic in the FPGA to the high speed networks that connect processors in the system. In this work we extend the compute capabilities of Matlab by incorporating support for FPGAs and automating the parallel code generation. We characterize the system and evaluate the performance gains that can be achieved by adding the FPGA for two compute intensive applications. We present performance results for medical imaging and fluid dynamics applications implemented in a CPU+GPU+FPGA system and achieved up to 40× improvement compared to the standard Matlab CPU+GPU environment.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131214030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

TNT10G: A high-accuracy 10 GbE traffic player and recorder for multi-Terabyte traces TNT10G:高精度10gbe流量播放器和记录器，用于多tb的跟踪

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032561

J. F. Zazo, Marco Forconesi, S. López-Buedo, G. Sutter, J. Aracil

{"title":"TNT10G: A high-accuracy 10 GbE traffic player and recorder for multi-Terabyte traces","authors":"J. F. Zazo, Marco Forconesi, S. López-Buedo, G. Sutter, J. Aracil","doi":"10.1109/ReConFig.2014.7032561","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032561","url":null,"abstract":"In this paper we present TNT10G (multi-Terabyte trace Network Tester), an FPGA-based tool for replaying and capturing massive Ethernet traces at 10 Gb/s. The tool is capable of reproducing and storing terabytes of network traffic at line rate, even if small packets are being used. Moreover, since the design works at low level (XGMII), accuracy is better than 10 ns, and it is also possible to observe and generate anomalous conditions, such as malformed frames, FCS errors, or illegal inter-frame gaps. All such features make TNT10G a truly useful tool for network testing and monitoring at 10 Gb/s. The design uses the NetFPGA-10G platform, although it could be easily ported to other boards since it uses standard AXI buses. The key element to achieve line-rate operation is a custom-developed Linux driver, which works in conjunction with a high-speed DMA backend core from Northwest Logic. Such blocks, together with a RAID0 array of commodity SSD disks, enable operation at 10 Gb/s. Finally, the use of a low-cost academic board together with off-the-shelf components allows for an open, extensible and cost-effective solution, a unique combination not found in commercial products.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129260889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12