{"title":"A lightweight infrastructure for the dynamic creation and configuration of virtual platforms","authors":"C. Sauer, Hans-Peter Löb","doi":"10.1109/SAMOS.2015.7363701","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363701","url":null,"abstract":"Virtual prototypes leverage SystemC/TLM for simulating programmable platforms comprising 100s of modules. Their efficient creation and configuration is vital for acceptable turnaround times, e.g., during performance exploration or software development. Therefore, our lightweight infrastructure provides a factory creating designs from abstract descriptions of module instances, properties, and connections. Modules mark properties as creation or runtime parameters. The resulting generic design descriptions are usable by non-experts and enable front-ends. The infrastructure is a small C++ library that can be combined with existing SystemC/TLM models and simulation kernels. An industrial case study of a complex multiprocessor SoC shows a distinct productivity gain.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129723949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Wehner, J. Rettkowski, Tobias Kleinschmidt, D. Göhringer
{"title":"MPSoCSim: An extended OVP simulator for modeling and evaluation of Network-on-Chip based heterogeneous MPSoCs","authors":"P. Wehner, J. Rettkowski, Tobias Kleinschmidt, D. Göhringer","doi":"10.1109/SAMOS.2015.7363704","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363704","url":null,"abstract":"In this paper a SystemC simulator for Network-on-Chip (NoC) based Multiprocessor Systems-on-Chip (MPSoCs) is presented. The simulator currently supports mesh topology with wormhole switching and several routing algorithms such as XY-, a minimal West-First and an adaptive West-First algorithm. The impact of routing algorithms regarding performance can be analyzed by means of the presented simulator. In order to simulate a heterogeneous MPSoC, ARM processors and MicroBlazes can be attached to the NoC. Processor and peripheral models used within the test platforms are provided by Imperas/OVP. Moreover, traffic generators are available to analyze the system. An additional SystemC component enables the readout of simulation time from within the application. For evaluation of the simulator multiple platforms and applications were put under test and compared with a hardware implementation. The comparison shows that the simulator improves the development of MPSoCs by early estimation of system requirements.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132438707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HEVC in-loop filters GPU parallelization in embedded systems","authors":"D. Souza, A. Ilic, N. Roma, L. Sousa","doi":"10.1109/SAMOS.2015.7363667","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363667","url":null,"abstract":"The added encoding efficiency and visual quality that is offered by the latest HEVC standard is mostly attained at the cost of a significant increase of the computational complexity at both the encoder and decoder. However, such added complexity greatly compromises the implementation of this standard in computational and energy constrained devices, including embedded systems, mobile and battery supplied devices. To circumvent this limitation, this paper proposes the exploitation of embedded GPU devices already equipping many state of the art SoCs to accelerate the HEVC in-loop filters (i.e. deblocking filter and sample adaptive offset). The presented approaches comprehensively exploit both fine and coarse-grained parallelization opportunities of these filters in an NVIDIA Tegra GPU.According to the conducted experimental evaluation, the proposed approach showed to be a remarkable strategy to satisfy the real-time requirements of the HEVC decoder, being able to filter each Ultra HD 4K intra frame in less than 20 ms (about 50 fps).","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134530157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application autotuning to support runtime adaptivity in multicore architectures","authors":"D. Gadioli, G. Palermo, C. Silvano","doi":"10.1109/SAMOS.2015.7363673","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363673","url":null,"abstract":"In this work, we introduce an application autotuning framework to dynamically adapt applications in multicore architectures. In particular, the framework exploits design-time knowledge and multi-objective requirements expressed by the user, to drive the autotuning process at the runtime. It also exploits a monitoring infrastructure to get runtime feed-back and to adapt to external changing conditions. The intrusiveness of the autotuning framework in the application (in terms of refactoring and lines of code to be added) has been kept limited, also to minimize the integration cost. To assess the proposed framework, we carried out an experimental campaign to evaluate the overhead, the relevance of the described features and the efficiency of the framework.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129615562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance evaluation of image noise reduction computing on a mobile platform","authors":"J. Hannuksela, M. Niskanen, Markus Turtinen","doi":"10.1109/SAMOS.2015.7363694","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363694","url":null,"abstract":"Noise reduction is one of the most fundamental digital image processing challenges. On mobile devices, proper solutions for this task can significantly increase the output image quality making the use of a camera even more attractive for customers. The main challenge is that the processing time and energy efficiency must be optimized, since the response time and the battery life are critical factors for all mobile applications. To identify the solutions that maximizes the real-time performance, we compare several different implementations in terms of computational performance and energy efficiency. Specifically, we compare the OpenCL based design with multithreaded and NEON accelerated implementations and analyze them on the mobile platform. Based on the results of this study, the OpenCL framework provides a viable energy efficient alternative for implementing computer vision algorithms.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132244624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikolaos Ilieskou, M. Blom, L. Somers, M. Reniers, T. Basten
{"title":"Multi-Domain Virtual Prototyping in a SystemC SIL framework: A heating system case study","authors":"Nikolaos Ilieskou, M. Blom, L. Somers, M. Reniers, T. Basten","doi":"10.1109/SAMOS.2015.7363687","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363687","url":null,"abstract":"This paper presents a proof-of-concept for a modular SystemC SIL (Software-in-the-Loop) simulation environment, using a blackboard-like architecture. The proposed SIL framework integrates embedded control software with simulators developed in SystemC/SystemC-AMS or external tools, like MATLAB. The environment has been validated by a heating application for a professional printer, as example of an MDVP (Multi-Domain Virtual Prototyping) application. Our goal is to evaluate the use of SystemC/SystemC-AMS and to address the challenges in developing multiple-domain prototypes and blackboard-like SIL frameworks using this technology.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123362207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rethinking memory system design for data-intensive computing","authors":"O. Mutlu","doi":"10.1109/SAMOS.2015.7363650","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363650","url":null,"abstract":"Summary form only given. The memory system is a fundamental performance and energy bottleneck in almost all computing systems. Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system make it an even more important system bottleneck. At the same time, DRAM and flash technologies are experiencing difficult technology scaling challenges that make the maintenance and enhancement of their capacity, energy-efficiency, and reliability significantly more costly with conventional techniques. In this talk, we examine some promising research and design directions to overcome challenges posed by memory scaling. Specifically, we discuss three key solution directions: 1) enabling new memory architectures, functions, interfaces, and better integration of the memory and the rest of the system, 2) designing a memory system that intelligently employs multiple memory technologies and coordinates memory and storage management using non-volatile memory technologies, 3) providing predictable performance and QoS to applications sharing the memory/storage system. If time permits, we may also briefly describe our ongoing related work in combating scaling challenges of NAND flash memory.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117301317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient dual-ISA support in a retargetable, asynchronous Dynamic Binary Translator","authors":"T. Spink, Harry Wagstaff, Björn Franke, N. Topham","doi":"10.1109/SAMOS.2015.7363665","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363665","url":null,"abstract":"Dynamic Binary Translation (DBT) allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Some modern DBT systems decouple their main execution loop from the built-in Just-In-Time (JIT) compiler, i.e. the JIT compiler can operate asynchronously in a different thread without blocking program execution. However, this creates a problem for target architectures with dual-ISA support such as ARM/THUMB, where the ISA of the currently executed instruction stream may be different to the one processed by the JIT compiler due to their decoupled operation and dynamic mode changes. In this paper we present a new approach for dual-ISA support in such an asynchronous DBT system, which integrates ISA mode tracking and hot-swapping of software instruction decoders. We demonstrate how this can be achieved in a retargetable DBT system, where the target ISA is not hard-coded, but a processor-specific module is generated from a high-level architecture description. We have implemented ARM V5T support in our DBT and demonstrate execution rates of up to 1148 MIPS for the SPEC CPU 2006 benchmarks compiled for ARM/THUMB, achieving on average 192%, and up to 323%, of the speed of QEMU, which has been subject to intensive manual performance tuning and requires significant low-level effort for retargeting.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115333251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efstathios Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos
{"title":"A virtual platform for exploring hierarchical interconnection for many-accelerator systems","authors":"Efstathios Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos","doi":"10.1109/SAMOS.2015.7363703","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363703","url":null,"abstract":"The advent of many-accelerator Systems-on-Chip (SoC), as a result of the ever increasing demands for high performance and energy efficiency, has lead to the need for new interconnection schemes among the system components, which minimize the communication overhead. Towards this need, Hierarchical Networks-on-Chip (HNoCs) can provide an efficient communication paradigm for such systems: Each node is an autonomous sub-network including the hardware accelerators needed by the respective application thread, thus retaining data locality and minimizing congestion. However, HNoC design may lead to exponential increase in the design space size, due to the numerous parameter combinations of the sub-networks and the overall HNoC. In addition, the need for a prototyping framework supporting HNoC simulation with real stimuli is crucial for the accurate system evaluation. Therefore, the goal of this paper is to present (a) a SystemC framework for cycle-accurate simulation of Hierarchical NoCs, accompanied with a NoC API for node mapping on the HNoC; and (b) an exploration flow that targets to reduce the increased design space size. By using the Rician Denoising algorithm for MRI scans as a case study, the proposed DSE flow could achieve up to 2× and 1.48× time and power improvements respectively, as compared to a typical DSE flow.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117330216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efstathios Sotiriou-Xanthopoulos, Shalina Percy Delicia, P. Figuli, K. Siozios, G. Economakos, J. Becker
{"title":"A power estimation technique for cycle-accurate higher-abstraction SystemC-based CPU models","authors":"Efstathios Sotiriou-Xanthopoulos, Shalina Percy Delicia, P. Figuli, K. Siozios, G. Economakos, J. Becker","doi":"10.1109/SAMOS.2015.7363661","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363661","url":null,"abstract":"Due to the ever-increasing complexity of embedded system design and the need for rapid system evaluations in early design stages, the use of simulation models known as Virtual Platforms (VPs) has been of utmost importance as they enable system modeling at higher abstraction levels. Since a typical VP features multiple interdependent components, VP libraries have been utilized in order to provide off-the-shelf models of commonly-used hardware components, such as CPUs. However, CPU power estimation is not adequately supported by existing VP libraries. In addition, existing power characterization techniques require architectural details which are not always available in early design stages. To address this issue, this paper proposes a technique for power annotation of CPU models targeting SystemC/TLM libraries in order to enable the accurate power estimation at higher abstraction levels. By using a set of benchmarks on a power-annotated SystemC/TLM model of Xilinx Microblaze soft-processor, it is shown that the proposed approach can achieve accurate power estimation in comparison to the real-system power measurements as the estimation error ranges from 0.47% up to 6.11% with an average of 2%.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114980713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}