{"title":"400 Gbps energy-efficient multi-field packet classification on FPGA","authors":"Shijie Zhou, Sihan Zhao, V. Prasanna","doi":"10.1109/ReConFig.2014.7032486","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032486","url":null,"abstract":"Packet classification is a network kernel function that has been widely researched over the past decade. However, most previous work has only focused on achieving high-throughput without considering its energy-efficiency implications. With the rapid growth of Internet, energy-efficiency has become an important metric for networks. We present the design of an energy-efficient packet classifier on Field-Programmable Gate Arrays (FPGA). The classifier is arranged as a 2-dimensional array of processing elements to enable sustained high throughput. We developed a memory activation scheduling technique that is able to significantly reduce memory power dissipation by selectively activating memory blocks. We conducted experiments using real-life rule sets and packet traces to evaluate our design. The experimental results show that with the memory activation scheduling technique, our design achieves 1.8× greater energy-efficiency compared with a baseline implementation without this energy optimization. With 6 individual classifiers on a single chip and a rule set of size IK, our design sustains a throughput of 400 Gbps for minimum size (40 bytes) packets and can process over 100 Gbps network traffic per Joule. Compared with state-of-the-art solutions, we achieve over 1.7× improvement in energy-efficiency.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131814604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Vaz, Heinrich Riebler, Tobias Kenter, Christian Plessl
{"title":"Deferring accelerator offloading decisions to application runtime","authors":"G. Vaz, Heinrich Riebler, Tobias Kenter, Christian Plessl","doi":"10.1109/ReConFig.2014.7032509","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032509","url":null,"abstract":"Reconfigurable architectures provide an opportunity to accelerate a wide range of applications, frequently by exploiting data-parallelism, where the same operations are homogeneously executed on a (large) set of data. However, when the sequential code is executed on a host CPU and only data-parallel loops are executed on an FPGA coprocessor, a sufficiently large number of loop iterations (trip counts) is required, such that the control- and data-transfer overheads to the coprocessor can be amortized. However, the trip count of large data-parallel loops is frequently not known at compile time, but only at runtime just before entering a loop. Therefore, we propose to generate code both for the CPU and the coprocessor, and to defer the decision where to execute the appropriate code to the runtime of the application when the trip count of the loop can be determined just at runtime. We demonstrate how an LLVM compiler based toolflow can automatically insert appropriate decision blocks into the application code. Analyzing popular benchmark suites, we show that this kind of runtime decisions is often applicable. The practical feasibility of our approach is demonstrated by a toolflow that automatically identifies loops suitable for vectorization and generates code for the FPGA coprocessor of a Convey HC-1. The toolflow adds decisions based on a comparison of the runtime-computed trip counts to thresholds for specific loops and also includes support to move just the required data to the coprocessor. We evaluate the integrated toolflow with characteristic loops executed on different input data sizes.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"415 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131528691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karim M. A. Ali, R. B. Atitallah, S. Hanafi, J. Dekeyser
{"title":"A generic pixel distribution architecture for parallel video processing","authors":"Karim M. A. Ali, R. B. Atitallah, S. Hanafi, J. Dekeyser","doi":"10.1109/ReConFig.2014.7032547","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032547","url":null,"abstract":"I/O data distribution for neighbourhood operations processed in parallel computing dominates the multimedia video processing domain. Hardware designers are confronted with the challenge of architecture obsolescence due to the lack of flexibility to adapt the I/O system while upgrading the parallelism level. The usage of reconfigurable computing solves the problem partially with the capability of hardware partitioning according to the application requirements. Taking this aspect into consideration, we propose a generic I/O data distribution model dedicated to parallel video processing. Several parameters can be configured according to the required size of macro-block with the possibility to control the sliding step in both horizontal and vertical directions. The generated model is used as a part of the parallel architecture processing multimedia applications. We implemented our architecture on the Xilinx Zynq ZC706 FPGA evaluation board for two applications: the video downscaler (1:16) and the convolution filter. The efficiency of our system for distributing pixels among parallel IPs is demonstrated through several experiments. The experimental results show the decrease in the design effort using the code generation tool, the low hardware cost of our solution and how flexible is the model to be configured for different distribution scenarios.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123665949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Embedding FPGA overlays into configurable Systems-on-Chip: ReconOS meets ZUMA","authors":"T. Wiersema, Arne Bockhorn, M. Platzner","doi":"10.1109/ReConFig.2014.7032514","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032514","url":null,"abstract":"Virtual FPGAs are overlay architectures realized on top of physical FPGAs. They are proposed to enhance or abstract away from the physical FPGA for experimenting with novel architectures and design tool flows. In this paper, we present an embedding of a ZUMA-based virtual FPGA fabric into a complete configurable system-on-chip. Such an embedding is required to fully harness the potential of virtual FPGAs, in particular to give the virtual circuits access to main memory and operating system services, and to enable a concurrent operation of virtualized and non-virtualized circuitry. We discuss our extension to ZUMA and its embedding into the ReconOS operating system for hardware/software systems. Furthermore, we present an open source tool flow to synthesize configurations for the virtual FPGA.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116526353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The FPGA implementation of an image registration algorithm using binary images","authors":"An Hung Nguyen, M. Pickering, A. Lambert","doi":"10.1109/ReConFig.2014.7032559","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032559","url":null,"abstract":"The FPGA implementation of image registration algorithms is a challenging problem due to the limited resources of the hardware and the requirement for real-time processing speeds. Image registration approaches using low bit-resolution images are more feasible for implementation on FPGAs than those using full resolution images because of the significant reduction in hardware resources required. The real-time processing requirement can also be satisfied with the use of simple logic operations such as AND, XOR and NOT instead of more complex computations such as additions and multiplications. This paper presents the implementation of an image registration algorithm on two FPGAs from the SPARTAN-3E family for the case of translational motion.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127404291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Rodriguez, José F. Martínez, J. A. Carrasco-Ochoa, M. Lazo-Cortés, R. Cumplido, C. F. Uribe
{"title":"A hardware architecture for filtering irreducible testors","authors":"V. Rodriguez, José F. Martínez, J. A. Carrasco-Ochoa, M. Lazo-Cortés, R. Cumplido, C. F. Uribe","doi":"10.1109/ReConFig.2014.7032526","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032526","url":null,"abstract":"Feature selection in pattern recognition is a problem whose space complexity grows exponentially regarding the number of attributes in a dataset. There are several hardware implementations of algorithms for overcoming this complexity. These hardware architectures relay on a software component for filtering irreducible features subsets, which is a computationally complex task. In this paper, a new hardware module for the filtering process is presented. The main advantage of this new architecture is that no additional time is required for hardware execution whilst the software component is no longer needed. Experimental results show that the runtime magnitude order for software is the same as for hardware in some cases. The proposed architecture is algorithm independent and may lead to smaller hardware realizations than previous architectures.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125997710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Sahraoui, Ghaffari Fakhreddine, M. A. Benkhelifa, B. Granado
{"title":"Context-aware resources placement for SRAM-based FPGA to minimize checkpoint/recovery overhead","authors":"F. Sahraoui, Ghaffari Fakhreddine, M. A. Benkhelifa, B. Granado","doi":"10.1109/ReConFig.2014.7032506","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032506","url":null,"abstract":"Existing SRAM-based Field Programmable Gate Arrays (FPGAs) are very sensitive to Single Event Effects (SEE) phenomena in harsh environments. To protect applications running on SRAM-based FPGAs from SEE, those applications mainly relay on resources redundancy approaches, which involve significant resources overhead. New proposed fault mitigation approaches use Partial Dynamic Reconfiguration to overcome such huge overhead of redundancy methods. In [1] a Backward Error Recovery (BER) approach based on Partial Dynamic Reconfiguration (PDR) is proposed. Nevertheless, such approach suffers greatly from time latency issue. In this paper, we introduce a new context-aware resources placement strategy to minimize the time overhead induced by the BER fault mitigation approach. Both of checkpoint and recovery overhead are evaluated with and without our context-aware resources placement strategy. A reduction of up to 71 % of context frame is reported.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"274 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123719079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic run-time hardware/software scheduling for 3D reconfigurable SoC","authors":"Quang-Hai Khuat, D. Chillet, M. Hübner","doi":"10.1109/ReConFig.2014.7032512","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032512","url":null,"abstract":"In this paper, we present a new online hardware/software (HW/SW) scheduling algorithm for a 3D Reconfigurable SoC platform comprising a multiprocessors layer and a heterogeneous reconfigurable layer. The proposed algorithm decides on the fly whether the tasks will run in SW or HW, at which time, on which processor or in which region of the reconfigurable layer in order to minimize the overall execution time of the application. It evaluates, during runtime, the interest to continue the SW execution of a task or to cancel it for starting a new HW execution of this task from the initial state. By using our algorithm called Hardware/Software algorithm with Software execution Prediction (HSSP), the overall execution time can be reduced by 26 % compared with other existing HW/SW scheduling methods.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123778238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benoît Chappet de Vangel, C. Torres-Huitzil, B. Girau
{"title":"Spiking dynamic neural fields architectures on FPGA","authors":"Benoît Chappet de Vangel, C. Torres-Huitzil, B. Girau","doi":"10.1109/ReConFig.2014.7032557","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032557","url":null,"abstract":"Neuromorphic engineering is a very active field aiming to design dedicated hardware architectures to simulate the tremendous power and complexity of the brain at real time speed. Many high scaled generic projects are a success but we focus on decentralized embeddable implementations of dynamic neural fields (DNFs): a popular building blocks approach to simulate high level cognitive behaviors. The main difficulty of this approach is its mandatory all-to-all connectivity within the neural network which does not fit hardware constraints. Here we show that it is possible to decentralize the DNF computations using a cellular grid of spiking neurons with stochastic transmissions mapped onto a field programmable gate array (FPGA). The advantages of these randomly spiking dynamic neural fields (RSDNFs) are a dedicated 1-bit probabilistic XY broadcast routing network with inherent synaptic weights computations that provides hardware compatibility thanks to the 4-neighbor cellular connectivity. Moreover, this implementation strategy exhibits fault tolerance properties but it is more area greedy and time consuming than a standard implementation that broadcasts neuron addresses and coordinates using the address event representation (AER) on a centralized bus.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132723115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for efficient rapid prototyping by virtually enlarging FPGA resources","authors":"Shinya Takamaeda-Yamazaki, Kenji Kise","doi":"10.1109/ReConFig.2014.7032488","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032488","url":null,"abstract":"Rapid prototyping using FPGAs is a widely-applied approach for efficient evaluation of hardware structures. We present a rapid prototyping framework by virtually enlarging available FPGA resources. In order to mitigate the development complexity of FPGA-based hardware prototype, the framework provides two abstractions of resources on FPGA platforms: Memory systems and inter-FPGA interconnections on multi-FPGA platforms. The framework enables designers to draw up a target hardware using abstract interfaces as ideal memory systems and interconnections on FPGA platforms. Our evaluation result shows that the slowdowns in running speed under the abstractions are not critical, so that the framework offers the helpful support to develop a high-speed and accurate hardware prototype rapidly.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130729907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}