2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)最新文献_第7页

400 Gbps energy-efficient multi-field packet classification on FPGA 基于FPGA的400gbps高效多场分组分类

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032486

Shijie Zhou, Sihan Zhao, V. Prasanna

{"title":"400 Gbps energy-efficient multi-field packet classification on FPGA","authors":"Shijie Zhou, Sihan Zhao, V. Prasanna","doi":"10.1109/ReConFig.2014.7032486","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032486","url":null,"abstract":"Packet classification is a network kernel function that has been widely researched over the past decade. However, most previous work has only focused on achieving high-throughput without considering its energy-efficiency implications. With the rapid growth of Internet, energy-efficiency has become an important metric for networks. We present the design of an energy-efficient packet classifier on Field-Programmable Gate Arrays (FPGA). The classifier is arranged as a 2-dimensional array of processing elements to enable sustained high throughput. We developed a memory activation scheduling technique that is able to significantly reduce memory power dissipation by selectively activating memory blocks. We conducted experiments using real-life rule sets and packet traces to evaluate our design. The experimental results show that with the memory activation scheduling technique, our design achieves 1.8× greater energy-efficiency compared with a baseline implementation without this energy optimization. With 6 individual classifiers on a single chip and a rule set of size IK, our design sustains a throughput of 400 Gbps for minimum size (40 bytes) packets and can process over 100 Gbps network traffic per Joule. Compared with state-of-the-art solutions, we achieve over 1.7× improvement in energy-efficiency.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131814604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Deferring accelerator offloading decisions to application runtime 将加速器卸载决策推迟到应用程序运行时

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032509

G. Vaz, Heinrich Riebler, Tobias Kenter, Christian Plessl

{"title":"Deferring accelerator offloading decisions to application runtime","authors":"G. Vaz, Heinrich Riebler, Tobias Kenter, Christian Plessl","doi":"10.1109/ReConFig.2014.7032509","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032509","url":null,"abstract":"Reconfigurable architectures provide an opportunity to accelerate a wide range of applications, frequently by exploiting data-parallelism, where the same operations are homogeneously executed on a (large) set of data. However, when the sequential code is executed on a host CPU and only data-parallel loops are executed on an FPGA coprocessor, a sufficiently large number of loop iterations (trip counts) is required, such that the control- and data-transfer overheads to the coprocessor can be amortized. However, the trip count of large data-parallel loops is frequently not known at compile time, but only at runtime just before entering a loop. Therefore, we propose to generate code both for the CPU and the coprocessor, and to defer the decision where to execute the appropriate code to the runtime of the application when the trip count of the loop can be determined just at runtime. We demonstrate how an LLVM compiler based toolflow can automatically insert appropriate decision blocks into the application code. Analyzing popular benchmark suites, we show that this kind of runtime decisions is often applicable. The practical feasibility of our approach is demonstrated by a toolflow that automatically identifies loops suitable for vectorization and generates code for the FPGA coprocessor of a Convey HC-1. The toolflow adds decisions based on a comparison of the runtime-computed trip counts to thresholds for specific loops and also includes support to move just the required data to the coprocessor. We evaluate the integrated toolflow with characteristic loops executed on different input data sizes.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"415 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131528691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A generic pixel distribution architecture for parallel video processing 用于并行视频处理的通用像素分布架构

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032547

Karim M. A. Ali, R. B. Atitallah, S. Hanafi, J. Dekeyser

{"title":"A generic pixel distribution architecture for parallel video processing","authors":"Karim M. A. Ali, R. B. Atitallah, S. Hanafi, J. Dekeyser","doi":"10.1109/ReConFig.2014.7032547","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032547","url":null,"abstract":"I/O data distribution for neighbourhood operations processed in parallel computing dominates the multimedia video processing domain. Hardware designers are confronted with the challenge of architecture obsolescence due to the lack of flexibility to adapt the I/O system while upgrading the parallelism level. The usage of reconfigurable computing solves the problem partially with the capability of hardware partitioning according to the application requirements. Taking this aspect into consideration, we propose a generic I/O data distribution model dedicated to parallel video processing. Several parameters can be configured according to the required size of macro-block with the possibility to control the sliding step in both horizontal and vertical directions. The generated model is used as a part of the parallel architecture processing multimedia applications. We implemented our architecture on the Xilinx Zynq ZC706 FPGA evaluation board for two applications: the video downscaler (1:16) and the convolution filter. The efficiency of our system for distributing pixels among parallel IPs is demonstrated through several experiments. The experimental results show the decrease in the design effort using the code generation tool, the low hardware cost of our solution and how flexible is the model to be configured for different distribution scenarios.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123665949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Embedding FPGA overlays into configurable Systems-on-Chip: ReconOS meets ZUMA 将FPGA覆盖层嵌入到可配置的片上系统:ReconOS满足ZUMA

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032514

T. Wiersema, Arne Bockhorn, M. Platzner

引用次数: 22

The FPGA implementation of an image registration algorithm using binary images FPGA实现了一种利用二值图像的图像配准算法

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032559

An Hung Nguyen, M. Pickering, A. Lambert

引用次数: 2

A hardware architecture for filtering irreducible testors 一种用于过滤不可约测试的硬件架构

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032526

V. Rodriguez, José F. Martínez, J. A. Carrasco-Ochoa, M. Lazo-Cortés, R. Cumplido, C. F. Uribe

引用次数: 2

Context-aware resources placement for SRAM-based FPGA to minimize checkpoint/recovery overhead 基于sram的FPGA的上下文感知资源放置，以最小化检查点/恢复开销

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032506

F. Sahraoui, Ghaffari Fakhreddine, M. A. Benkhelifa, B. Granado

引用次数: 8

Dynamic run-time hardware/software scheduling for 3D reconfigurable SoC 动态运行时硬件/软件调度的3D可重构SoC

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032512

Quang-Hai Khuat, D. Chillet, M. Hübner

引用次数: 3

Spiking dynamic neural fields architectures on FPGA 基于FPGA的脉冲动态神经场结构

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032557

Benoît Chappet de Vangel, C. Torres-Huitzil, B. Girau

{"title":"Spiking dynamic neural fields architectures on FPGA","authors":"Benoît Chappet de Vangel, C. Torres-Huitzil, B. Girau","doi":"10.1109/ReConFig.2014.7032557","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032557","url":null,"abstract":"Neuromorphic engineering is a very active field aiming to design dedicated hardware architectures to simulate the tremendous power and complexity of the brain at real time speed. Many high scaled generic projects are a success but we focus on decentralized embeddable implementations of dynamic neural fields (DNFs): a popular building blocks approach to simulate high level cognitive behaviors. The main difficulty of this approach is its mandatory all-to-all connectivity within the neural network which does not fit hardware constraints. Here we show that it is possible to decentralize the DNF computations using a cellular grid of spiking neurons with stochastic transmissions mapped onto a field programmable gate array (FPGA). The advantages of these randomly spiking dynamic neural fields (RSDNFs) are a dedicated 1-bit probabilistic XY broadcast routing network with inherent synaptic weights computations that provides hardware compatibility thanks to the 4-neighbor cellular connectivity. Moreover, this implementation strategy exhibits fault tolerance properties but it is more area greedy and time consuming than a standard implementation that broadcasts neuron addresses and coordinates using the address event representation (AER) on a centralized bus.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132723115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A framework for efficient rapid prototyping by virtually enlarging FPGA resources 一个框架，有效的快速原型通过扩大FPGA资源

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI: 10.1109/ReConFig.2014.7032488

Shinya Takamaeda-Yamazaki, Kenji Kise

引用次数: 3