{"title":"A hardware/software infrastructure for performance monitoring on LEON3 multicore platforms","authors":"Nam Ho, Paul Kaufmann, M. Platzner","doi":"10.1109/FPL.2014.6927437","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927437","url":null,"abstract":"Monitoring applications at run-time and evaluating the recorded statistical data of the underlying micro architecture is one of the key aspects required by many hardware architects and system designers as well as high-performance software developers. To fulfill this requirement, most modern CPUs for High Performance Computing have been equipped with Performance Monitoring Units (PMU) including a set of hardware counters, which can be configured to monitor a rich set of events. Unfortunately, embedded and reconfigurable systems are mostly lacking this feature. Towards rapid exploration of High Performance Embedded Computing in near future, we believe that supporting PMU for these systems is necessary. In this paper, we propose a PMU infrastructure, which supports monitoring of up to seven concurrent events. The PMU infrastructure is implemented on an FPGA and is integrated into a LEON3 platform.We show also the integration of our PMU infrastructure with the perf_event, which is the standard PMU architecture of the Linux kernel.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125311974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bouthaina Damak, R. Benmansour, S. Niar, M. Baklouti, M. Abid
{"title":"A mixed integer linear programming approach for design space exploration in FPGA-based MPSoC","authors":"Bouthaina Damak, R. Benmansour, S. Niar, M. Baklouti, M. Abid","doi":"10.1109/FPL.2014.6927431","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927431","url":null,"abstract":"Heterogeneous Multiprocessor System-on-Chip (Ht-MPSoC) architectures represent a promising approach as they allow a higher performance/energy consumption trade-off. In such systems, the processor instruction set is enhanced by application-specific custom instructions implemented on reconfigurable fabrics, namely FPGA. To increase area utilization and guarantee application constraint respect, we propose a new architecture where Ht-MPSoC hardware accelerators are shared among different processors in an intelligent manner. In this paper, a Mixed Integer Linear Programming (MILP) model is proposed to systematically explore the complex design space of the different configurations.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"23 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121005618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammed Al Kadi, Max Ferger, Volker Stegemann, M. Hübner
{"title":"Multi-FPGA reconfigurable system for accelerating MATLAB simulations","authors":"Muhammed Al Kadi, Max Ferger, Volker Stegemann, M. Hübner","doi":"10.1109/FPL.2014.6927396","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927396","url":null,"abstract":"The use of reconfigurable FPGA devices to support the execution of computationally intensive software tasks is discussed in this paper. A system architecture consisting of multiple serially-connected FPGAs is developed, where each FPGA holds a pool of reconfigurable regions. An accelerator can be reconfigured into a region, replaced or discarded at runtime. Configurable connection blocks are responsible of directing data between any two accelerators. The whole system is connected via PCIe-interface to a host PC, where a middleware layer hides all hardware management operations, e.g. routing the data sent among the accelerators, and provides the end-user with an API to use the whole system. Recently, the very fast interfaces for reconfiguring parts of the used FPGAs minimize the overhead caused for hardware modifications. In addition, a manual design of hardware accelerators is not more needed with the continuously improving quality of high-level synthesis tools. In this paper, we considered the case where our system is used within MATLAB. We build a small library to compare and improve upon the execution times of some often used functions.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133552472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical reconfiguration of FPGAs","authors":"Dirk Koch, Christian Beckhoff","doi":"10.1109/FPL.2014.6927491","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927491","url":null,"abstract":"Partial reconfiguration allows some applications to substantially save FPGA area by time sharing resources among multiple modules. In this paper, we push this approach further by introducing hierarchical reconfiguration where reconfigurable modules can have reconfigurable submodules. This is useful for complex systems where many modules have common parts or where modules can share components. For such systems, we show that the number of bitstreams and the bitstream storage requirements can be scaled down from a multiplicative to an additive behavior with respect to the number of modules and submodules. A case study consisting of different reconfigurable softcore CPUs and hierarchically reconfigurable custom instruction set extensions demonstrates a 18.7× lower bitstream storage requirement and up to 10× faster reconfiguration speed when using hierarchical reconfiguration instead of using conventional single-level module-based reconfiguration.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132296715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qian Zhao, Kyosei Yanagida, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi
{"title":"A logic cell architecture exploiting the shannon expansion for the reduction of configuration memory","authors":"Qian Zhao, Kyosei Yanagida, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi","doi":"10.1109/FPL.2014.6927460","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927460","url":null,"abstract":"Most modern field-programmable gate arrays (FPGAs) employ a look-up table (LUT) as their basic logic cell. Although a k-input LUT can implement any k-input logic, its functionality relies on a large amount of configuration memory. As FPGA scales improve, the increased quantity of configuration memory cells required for FPGAs will require a larger area and consume more power. Moreover, the soft-error rate per device will also increase as more configuration memory cells are embedded. We propose scalable logic modules (SLMs), logic cells requiring less configuration memory, reducing configuration memory by making use of partial functions of Shannon expansion for frequently appearing logics. Experimental results show that SLM-based FPGAs use much less configuration memory and have smaller area than conventional LUT-based FPGAs.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132540680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aiko Iwasaki, Keisuke Dohi, Yuichiro Shibata, K. Oguri, Ryuichi Harasawa
{"title":"A soft-core processor for finite field arithmetic with a variable word size accelerator","authors":"Aiko Iwasaki, Keisuke Dohi, Yuichiro Shibata, K. Oguri, Ryuichi Harasawa","doi":"10.1109/FPL.2014.6927388","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927388","url":null,"abstract":"This paper presents implementation and evaluation of an accelerator architecture for soft-cores to speed up reduction process for the arithmetic on GF(2m) used in Elliptic Curve Cryptography (ECC) systems. In this architecture, the word size of the accelerator can be customized when the architecture is configured on an FPGA. Focusing on the fact that the number of the reduction processing operations on GF(2m) is affected by the irreducible polynomial and the word size, we propose to employ an unconventional word size for the accelerator depending on a given irreducible polynomial and implement a MIPS-based soft-core processor coupled with a variable-word size accelerator. As a result of evaluation with several polynomials, it was shown that the performance improvement of up to 10.2 times was obtained compared to the 32-bit word size, even taking into account the maximum frequency degradation of 20.4% caused by changing the word size. The advantage of using unconventional word sizes was also shown, suggesting the promise of this approach for low-power ECC systems.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130052319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated framework for FPGA-based parallel genetic algorithms","authors":"Liucheng Guo, David B. Thomas, Ce Guo, W. Luk","doi":"10.1109/FPL.2014.6927501","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927501","url":null,"abstract":"Parallel genetic algorithms (pGAs) are a variant of genetic algorithms which can promise substantial gains in both efficiency of execution and quality of results. pGAs have attracted researchers to implement them in FPGAs, but the implementation always needs large human effort. To simplify the implementation process and make the hardware pGA designs accessible to potential non-expert users, this paper proposes a general-purpose framework, which takes in a high-level description of the optimisation target and automatically generates pGA designs for FPGAs. Our pGA system exploits the two levels of parallelism found in GA instances and genetic operations, allowing users to tailor the architecture for resource constraints at compile-time. The framework also enables users to tune a subset of parameters at run-time without time-consuming recompilation. Our pGA design is more flexible than previous ones, and has an average speedup of 26 times compared to the multi-core counterparts over five combinatorial and numerical optimisation problems. When compared with a GPU, it also shows a 6.8 times speedup over a combinatorial application.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"123 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116435042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High throughput channel tracking for JTRS wireless channel emulation","authors":"Dajung Lee, J. Matai, B. T. Weals, R. Kastner","doi":"10.1109/FPL.2014.6927410","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927410","url":null,"abstract":"Testing and verifying wireless systems in a real world environments is a challenging but an important problem. This is particular true for the Joint Tactical Radio System (JTRS) where the modulation techniques are optimized towards environments that are difficult to reproduce (e.g., ship to plane, plane to satellite communications). Such cases necessitate a wireless channel emulator to facilitate testing in the laboratory as the protocols are being developed. Furthermore, the increasing complexity of communications protocols and highly variable network scenarios force the channel emulator to support an accurate and complicated channel model that can scale to handle a large number of radios that operate across a wide frequency spectrum. We developed a unique channel impairment emulator prototype to meet these requirements. It maximizes the scalability and performance, operating in a frequency range of 2 MHz to 2 GHz. Moreover, our emulator design accommodates radio operation that use unknown frequency hopping techniques, which is increasingly common in JTRS systems. This key feature to this system is a high throughput channel tracker module that handles high bandwidth intermediate frequency (IF) signals while providing the scalability to handle a large number of channels.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133248980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Valverde, Alfonso Rodríguez, Julio Camarero, A. Otero, J. Portilla, E. D. L. Torre, T. Riesgo
{"title":"A dynamically adaptable bus architecture for trading-off among performance, consumption and dependability in Cyber-Physical Systems","authors":"J. Valverde, Alfonso Rodríguez, Julio Camarero, A. Otero, J. Portilla, E. D. L. Torre, T. Riesgo","doi":"10.1109/FPL.2014.6927394","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927394","url":null,"abstract":"Cyber-Physical Systems need to handle increasingly complex tasks, which additionally, may have variable operating conditions over time. Therefore, dynamic resource management to adapt the system to different needs is required. In this paper, a new bus-based architecture, called ARTICo3, which by means of Dynamic Partial Reconfiguration, allows the replication of hardware tasks to support module redundancy, multi-thread operation or dual-rail solutions for enhanced side-channel attack protection is presented. A configuration-aware data transaction unit permits data dispatching to more than one module in parallel, or provide coalesced data dispatching among different units to maximize the advantages of burst transactions. The selection of a given configuration is application independent but context-aware, which may be achieved by the combination of a multi-thread model similar to the CUDA kernel model specification, combined with a dynamic thread/task/kernel scheduler. A multi-kernel application for face recognition is used as an application example to show one scenario of the ARTICo3 architecture.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124845618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Güneysu, F. Regazzoni, Pascal Sasdrich, M. Wójcik
{"title":"THOR - The hardware onion router","authors":"T. Güneysu, F. Regazzoni, Pascal Sasdrich, M. Wójcik","doi":"10.1109/FPL.2014.6927408","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927408","url":null,"abstract":"Security and privacy of data traversing internet have always been a major concern for all users. In this context, The Onion Routing (Tor) is the most successful protocol to anonymize global Internet traffic and is widely deployed as software on many personal computers or servers. In this paper, we explore the potential of modern reconfigurable devices to efficiently realize the Tor protocol on embedded devices. In particular, this targets the acceleration of the complex cryptographic operations involved in the handshake of routing nodes and the data stream encryption. Our hardware-based implementation on the Xilinx Zynq platform outperforms previous embedded solutions by more than a factor of 9 with respect to the cryptographic handshake - ultimately enabling quite inexpensive but highly efficient routers. Hence, we consider our work as a further milestone towards the development and the dissemination of low-cost and high performance onion relays that hopefully ultimately leads again to a more private Internet.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124941091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}