Leonardo Suriano, Alfonso Rodríguez, K. Desnos, M. Pelcat, E. D. L. Torre
{"title":"Analysis of a heterogeneous multi-core, multi-hw-accelerator-based system designed using PREESM and SDSoC","authors":"Leonardo Suriano, Alfonso Rodríguez, K. Desnos, M. Pelcat, E. D. L. Torre","doi":"10.1109/ReCoSoC.2017.8016151","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016151","url":null,"abstract":"Nowadays, new heterogeneous system technologies are flooding the market: through the past years, it is possible to observe the move from single CPUs to multi-core devices featuring CPUs, GPUs and large FPGAs, such as Xilinx Zynq-7000 or Zynq UltraScale+ MPSoC architectures. In this context, providing developers with transparent deployment capabilities to efficiently execute different applications on such complex devices is important. In this paper, a design flow that combines, on one side, PREESM, a dataflow-based prototyping framework and, on the other side, Xilinx SDSoC, an HLS-based framework to automatically generate and manage hardware accelerators, is presented. This integration leverages the automatic, static task scheduling obtained from PREESM with asynchronous invocations that trigger the parallel execution of multiple hardware accelerators from some of their associated sequential software threads. An image processing application is used as a proof of concept, showing the interoperability possibilities of both tools, the level of design automation achieved and, for the resulting computing architecture, the good performance scalability according to the number of accelerators and sw threads.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"476 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128326853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Domingo, R. Salvador, H. Fabelo, D. Madroñal, S. Ortega, R. Lazcano, E. Juárez, G. Callicó, C. Sanz
{"title":"High-level design using Intel FPGA OpenCL: A hyperspectral imaging spatial-spectral classifier","authors":"R. Domingo, R. Salvador, H. Fabelo, D. Madroñal, S. Ortega, R. Lazcano, E. Juárez, G. Callicó, C. Sanz","doi":"10.1109/ReCoSoC.2017.8016152","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016152","url":null,"abstract":"Current computational demands require increasing designer's efficiency and system performance per watt. A broadly accepted solution for efficient accelerators implementation is reconfigurable computing. However, typical HDL methodologies require very specific skills and a considerable amount of designer's time. Despite the new approaches to high-level synthesis like OpenCL, given the large heterogeneity in today's devices (manycore, CPUs, GPUs, FPGAs), there is no one-fits-all solution, so to maximize performance, platform-driven optimization is needed. This paper reviews some latest works using Intel FPGA SDK for OpenCL and the strategies for optimization, evaluating the framework for the design of a hyperspectral image spatial-spectral classifier accelerator. Results are reported for a Cyclone V SoC using Intel FPGA OpenCL Offline Compiler 16.0 out-of-the-box. From a common baseline C implementation running on the embedded ARM® Cortex®-A9, OpenCL-based synthesis is evaluated applying different generic and vendor specific optimizations. Results show how reasonable speedups are obtained in a device with scarce computing and embedded memory resources. It seems a great step has been given to effectively raise the abstraction level, but still, a considerable amount of HW design skills is needed.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117217326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Abdali, M. Pelcat, F. Berry, J. Diguet, F. Palumbo
{"title":"Exploring the performance of partially reconfigurable point-to-point interconnects","authors":"E. Abdali, M. Pelcat, F. Berry, J. Diguet, F. Palumbo","doi":"10.1109/ReCoSoC.2017.8016160","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016160","url":null,"abstract":"An ever larger share of FPGAs are supporting Dynamic and Partial Reconfiguration (DPR). A reconfigurable point-to-point interconnect (ρ-P2P) is a communication mechanism based on DPR that swaps between different precomputed configurations stored in partial bitstreams. ρ-Point-to-Point (P2P) is intended as a lightweight interconnect that suits the reconfigurable systems where a limited number of configurations are desirable. This paper assesses the pros and cons of ρ-P2P in terms of resource and performance depending on the number of input/output signals, their width and the number of supported configurations. Experimental results, conducted on an Intel Cyclone V FPGA, compare ρ-P2P to an equivalently functional non-DPR solution called μ-P2P and to a full crossbar. They show that ρ-P2P is indeed lightweight but introduces performance limitations on operating frequency, memory footprint and reconfiguration time. However, ρ-P2P is in general the least resource intensive of the tested interconnects, except in the trivial case of low numbers of signals and configurations. In particular, an 18 × 18 full crossbar interconnect requires 75% more resources than an equivalent ρ-P2P. Interestingly, this resource difference between ρ-P2P and a full crossbar grows linearly with the interconnect size.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"8 Pt 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126270208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Joseph, Lennart Bamberg, Sven Wrieden, Dominik Ermel, A. Ortiz, Thilo Pionteck
{"title":"Design method for asymmetric 3D interconnect architectures with high level models","authors":"J. Joseph, Lennart Bamberg, Sven Wrieden, Dominik Ermel, A. Ortiz, Thilo Pionteck","doi":"10.1109/ReCoSoC.2017.8016143","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016143","url":null,"abstract":"New 3D production methods enable heterogeneous integration of dies manufactured in different technology nodes. Asymmetric 3D interconnect architectures (A-3D-IAs) are the communication infrastructure targeting these heterogeneous 3D system on chips (3D SoCs), for which design methodologies and design tools are still missing. Here, a design method is proposed following an incremental approach enabled by high level models. Therefore, we present the first simulator and design framework covering the diverse requirements of A-3D-IAs. This includes an abstract model to estimate the application specific energy consumption of 2D metal wires and 3D through silicon vias (TSVs) in an A-3D-IA. It is validated by circuit simulations in combination with an electromagnetic field solver which is used for the extraction of the TSV array equivalent circuit. The model lays on a high abstraction level for fast simulations. Nonetheless, for real data stream scenarios it still shows a small maximum error of less than 8%. Additionally, a mathematical description is presented which enables a fast evaluation of low power coding schemes for A-3D-IA on a high level of abstraction.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124793972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter Rouget, Benoît Badrignans, P. Benoit, L. Torres
{"title":"SecBoot — lightweight secure boot mechanism for Linux-based embedded systems on FPGAs","authors":"Peter Rouget, Benoît Badrignans, P. Benoit, L. Torres","doi":"10.1109/ReCoSoC.2017.8016144","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016144","url":null,"abstract":"In recent years, the need in security for embedded devices and data centers has increased sharply. The possible consequences of attacks on these equipments make them privileged targets. In these fields, FPGA are increasingly used because of their flexibility and constantly decreasing power consumption and cost: they can embed several hard/soft processors running Linux enhancing system integration. This paper discusses the security issues related to operating system boot security on FPGAs. We show how the software early boot stages can be protected using FPGA built-in security mechanisms and user logic. We consider that external memories can be tampered by software attacks or board level attacks. By using open source elements and standard tools, we present and implement a lightweight solution. We show that the dynamic reconfiguration has nearly no impact on usable resources of the FPGA matrix at the end of the boot process.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129857515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fatemeh Arezoomand, Arghavan Asad, M. Fazeli, M. Fathy, F. Mohammadi
{"title":"Energy aware and reliable STT-RAM based cache design for 3D embedded chip-multiprocessors","authors":"Fatemeh Arezoomand, Arghavan Asad, M. Fazeli, M. Fathy, F. Mohammadi","doi":"10.1109/ReCoSoC.2017.8016154","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016154","url":null,"abstract":"In Nano-scale technologies, static power consumption due to leakage current has become a serious issue in the design of SRAM based on-chip cache memories. To address this issue, non-volatile memory technologies such as STT-RAM (Spin Transfer Torque-RAM) have been proposed as a replacement for SRAM cells due to their near zero static power consumption and high memory density. Nonetheless, STT-RAMs suffer from some failures such as read disturb and limited endurance as well as high switching energy. One effective way to decrease the STT-RAMs' switching energy is to reduce their retention time, however, reducing the retention time has a negative impact on the reliability of STT-RAM cells. In this paper, we propose a hybrid cache layer for an embedded 3D-Chip Multiprocessor which employs two types of STT-RAM memory banks with retention time of 1s and 10ms to provide a beneficial tradeoff between reliability, energy consumption, and performance. To this end, we also propose an optimization model to find the optimal configurations for these two kinds of memory banks. Simulation results using the Gem5 simulator through comparisons with fully SRAM and fully STT-RAM based cache show that the proposed hybrid cache consumes significantly less power while offering higher throughput (instructions per cycle) compared to a fully STT-RAM based cache.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131703462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Federated system-to-service authentication and authorization combining PUFs and tokens","authors":"Marta Beltrán, Miguel Calvo, Sergio Gonzalez","doi":"10.1109/ReCoSoC.2017.8016157","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016157","url":null,"abstract":"Different application domains are challenging the still immature access control mechanisms currently used to authenticate and to authorize system-on-chip architectures to services deployed locally or in the cloud. These domains include Internet of Things, Smart Places or Industry 4.0 where different kinds of devices and objects, often poorly physically protected, low-cost and energy-constrained, interact with different kinds of services through lightweight communication protocols. These protocols usually guarantee basic data confidentiality and integrity, securing communication channels using cryptography, but there are still important challenges related to authentication and authorization. This work proposes a new system-to-service authentication and authorization mechanism based on the combination of a Physical Unclonable Function (PUF) and two tokens (one devoted to authentication and the other devoted to authorization), capable of working over HTTP or COAP relying on federated schemes and adapted to the specific requirements of this kind of environments. The new mechanism is validated and its efficiency and security are evaluated using a real healthcare case study.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"89-90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123146765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"System-level design for communication-centric task farm applications","authors":"Daniela Genius, L. Apvrille","doi":"10.1109/ReCoSoC.2017.8016145","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016145","url":null,"abstract":"Massively parallel applications such as telecommunication and video streaming have the particularity that a large proportion of the time is spent on accessing communication channels between the tasks, due to contention on the on-chip interconnect. Moreover, the analysis of a given task deployment is often fastidious. Thus, we propose to extend an existing easy-to-use System-level Design methodology to task farm applications. The contribution first concerns adding relevant SysML modeling elements to take into account application code, hardware platforms and deployment constraints. Secondly, new modeling elements — including access techniques to communication channels — must be given a semantics in order to transform models into a well-defined SystemC virtual prototyping MPSoC platform. A telecommunication application serves as an example.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128161926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hamidreza Ahmadian, Farzad Nekouei, R. Obermaisser
{"title":"Fault recovery and adaptation in time-triggered Networks-on-Chips for mixed-criticality systems","authors":"Hamidreza Ahmadian, Farzad Nekouei, R. Obermaisser","doi":"10.1109/ReCoSoC.2017.8016149","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016149","url":null,"abstract":"Adaptivity in terms of fault recovery and energy efficiency alongside with mixed-criticality support are demanded in today's embedded systems. Safety-critical systems are desired to switch between precomputed resource allocations at runtime based on the monitored information from the platform. In addition, those systems are desired to adjust their internal behavior with regard to a change in the environment, while operating at a desired safety level. At the same time, resource requests in such systems can be highly dynamic and data dependent. Aiming at meeting a superset of all worst case demands leads to unaffordable overheads in terms of resource utilization. Hence, efficient resource management mechanisms are required to provide fault recovery and to make the system adaptive to the changes in the environmental or the resource requests, while keeping the system at a safe state. This paper introduces a solution for supporting resource management in networks-on-chips that fulfills the requirements of adaptive mixed-criticality systems and proposes an architecture that establishes fault recovery by switching between precomputed resource allocations based on the statistical and diagnostic information.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125334089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Current mode detection in hard real-time automotive applications dedicated to many-core platforms","authors":"P. Dziurzański, T. Maka","doi":"10.1109/ReCoSoC.2017.8016162","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016162","url":null,"abstract":"This paper proposes a technique for determining the current mode in an electronic control unit (ECU) during run-time. We use a decision tree classifier which observes the latest execution times of processes (runnables). When a mode change is detected, the migration of runnables is performed to decrease the number of active cores leading to considerable energy savings while still not violating any of timing constraints. The proposed approach consists of both off-line and on-line steps, whereas more computational intensive steps are performed statically. In the presented automotive use case, the current mode is detected with 100% accuracy while observing execution time of a particular single runnable. The migration time of systems with dynamic mode detection based on the runnable execution time with various periods is also provided.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122529582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}