2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)最新文献_第3页

Deploying OpenMP on an embedded multicore accelerator 在嵌入式多核加速器上部署OpenMP

2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) Pub Date : 2013-07-15 DOI: 10.1109/SAMOS.2013.6621121

S. Agathos, V. Dimakopoulos, Aggelos Mourelis, Alexandros Papadogiannakis

{"title":"Deploying OpenMP on an embedded multicore accelerator","authors":"S. Agathos, V. Dimakopoulos, Aggelos Mourelis, Alexandros Papadogiannakis","doi":"10.1109/SAMOS.2013.6621121","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621121","url":null,"abstract":"Multiprocessor systems-on-chip (MPSoC) are now considered first-class citizens both in the embedded systems and in the high-performance computing arenas, in the form of specialized or general-purpose accelerators. Programming models for such systems is currently a hot research topic, and as a general rule require deep programmer knowledge of the underlying hardware architecture. In this paper we present the implementation of OpenMP, one of the most intuitive and productive programming models, on the STHORM accelerator. This particular platform provides a shared-memory substrate which OpenMP requires. An innovative feature of our design is the deployment of the OpenMP model both at the host and the fabric sides, in a seamless way, which provides the programmer with a simple but effective interface for offloading and executing OpenMP kernels on the MPSoC. The optimized runtime environment provides full OpenMP support despite its small footprint (less than 10KB for a 16-core cluster) and can sustain close-to-ideal speedups in computationally intensive applications. We detail on design issues we faced along with their solutions, given the limited available resources.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121666860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

A scalable FFT processor architecture for OFDM based communication systems 基于OFDM通信系统的可扩展FFT处理器体系结构

2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) Pub Date : 2013-07-15 DOI: 10.1109/SAMOS.2013.6621101

Deepak Revanna, Omer Anjum, Manuele Cucchi, Roberto Airoldi, J. Nurmi

{"title":"A scalable FFT processor architecture for OFDM based communication systems","authors":"Deepak Revanna, Omer Anjum, Manuele Cucchi, Roberto Airoldi, J. Nurmi","doi":"10.1109/SAMOS.2013.6621101","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621101","url":null,"abstract":"The modern wireless standards predominantly are based on OFDM communication systems. Various mobile devices in recent times support multiple wireless standards and demand efficient transceiver. Hence, in a communication transceiver the baseband hardware needs to be scalable and efficient across multiple standards. In an OFDM based transceiver, FFT computation is one of the most computationally intensive and power hungry modules. Design of FFT hardware is a challenging task while balancing design parameters such as speed, power, area, flexibility and scalability. The research work in this paper proposes a scalable radix-2 N-point novel FFT processor architecture. The architecture design is based on an approach to balance various specified design parameters to meet the requirements of SDR platforms supporting multiple wireless standards. The FFT processor was designed and prototyped using VHDL on an Altera Stratix V FPGA device 5SGSMD5K2F40C2. The processor operates at a maximum frequency of 200MHz, uses less than 1% of FPGA device resources and meets the performance requirements of multiple wireless standards such as IEEE 802.11a/g, IEEE 802.16e, 3GPP-LTE, DAB and DVB. The proposed architecture outperforms the existing fixed and variable length FFT processors in terms of speed, flexibility and scalability.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130783397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Fast transaction-level dynamic power consumption modelling in priority preemptive wormhole switching networks on chip 片上优先抢占式虫洞交换网络的快速事务级动态功耗建模

2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) Pub Date : 2013-07-15 DOI: 10.1109/SAMOS.2013.6621120

J. Harbin, L. Indrusiak

{"title":"Fast transaction-level dynamic power consumption modelling in priority preemptive wormhole switching networks on chip","authors":"J. Harbin, L. Indrusiak","doi":"10.1109/SAMOS.2013.6621120","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621120","url":null,"abstract":"This paper specifies an architecture for power consumption modelling integrated within cycle-approximate transaction level modelling for network-on-chip (NoC) simulation. NoC simulations during design validation have traditionally been limited to very short durations, due to the necessity to perform cycle-accurate simulation to represent fully the low level system simulated. Due to the high proportion of overall system power that may be consumed by a busy NoC, high-fidelity NoC power modelling is especially important to accurately assess the effectiveness of link coding and other strategies to reduce NoC power consumption. The paper describes the extension of a cycle-approximate TLM methodology to encompass power modelling in NoCs, considering its operation with real application traffic. The proposed scheme avoids modelling of flit-by-flit progress during non-preemptive periods of packet transmission. The simulation performance and accuracy are contrasted with theoretical models and a flit-by-flit scheme (in which each flow control digit passing along a bus wire is simulated). The power consumption reduction delivered by encoding schemes such as bus-invert coding are considered and compared with analytical models to verify the correct performance of the simulation models.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131574549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Efficient runtime support for embedded MPSoCs 有效的运行时支持嵌入式mpsoc

2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) Pub Date : 2013-07-15 DOI: 10.1109/SAMOS.2013.6621119

D. Theodoropoulos, Polyvios Pratikakis, D. Pnevmatikatos

{"title":"Efficient runtime support for embedded MPSoCs","authors":"D. Theodoropoulos, Polyvios Pratikakis, D. Pnevmatikatos","doi":"10.1109/SAMOS.2013.6621119","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621119","url":null,"abstract":"Recently, many software runtime systems have been proposed that allow developers to efficiently map applications to contemporary consumer electronic devices and high-performance academic processing platforms. Most of these runtime systems employ advanced scheduling techniques for automatic task assignment to all available processing elements. However, they focus on a particular environment and architecture, and it is not easy to port them to reconfigurable embedded MPSoCs. As a consequence, in the embedded community, researchers implement hardwired application-specific task schedulers, which can not be used by other embedded MPSoCs. To address this problem, in this paper we propose a lightweight runtime software framework for reconfigurable shared-memory MPSoCs, that integrate a master embedded processor connected to slave cores. Similarly to many of the aforementioned advanced runtime systems, we adopt a task-based programming model that uses simple, pragma-based annotations of the application software, in order to dynamically resolve task dependencies. Our runtime system supports heterogeneity in the hardware resources, and is also low-overhead to account for possible limitations in their processing capabilities and available on-chip memory. To evaluate our proposal, we have prototyped an MPSoC with seven slaves to a Xilinx ML605 FPGA board. We run three micro-benchmarks that achieve a performance speedup of 3.8x, 7x and 5.8x, and energy consumption of 27%, 14% and 18% respectively, compared to a single-core baseline system with no runtime support.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130983053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A just-in-time modulo scheduling for virtual coarse-grained reconfigurable architectures 虚拟粗粒度可重构体系结构的实时模调度

2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) Pub Date : 2013-07-15 DOI: 10.1109/SAMOS.2013.6621122

R. Ferreira, Vinicius Duarte, Waldir Meireles, M. Pereira, L. Carro, Stephan Wong

{"title":"A just-in-time modulo scheduling for virtual coarse-grained reconfigurable architectures","authors":"R. Ferreira, Vinicius Duarte, Waldir Meireles, M. Pereira, L. Carro, Stephan Wong","doi":"10.1109/SAMOS.2013.6621122","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621122","url":null,"abstract":"In the past decade, most solutions concerning the mapping of the compute-intensive loop kernels to accelerators have used heuristics and compiler-based strategies. These facts require that most of the decisions be taken at design time, thus precluding efficient solutions that can take run-time information into account. Any success in accelerating such applications greatly depends on two steps, extracting the loops and mapping them into the architecture. This last step is a challenge in itself since it is a NP-complete problem. In this paper, we propose a runtime solution that can provide speed ups of 3 to 6 orders of magnitude for the mapping step when compared to the state-of-the-art at minimal performance degradation, by the combined usage of 3 distinct mechanisms: 1) a simple and efficient modulo scheduling heuristic, 2) a crossbar network, which simplifies the placement and routing, 3) a virtual coarse-grained reconfigurable architecture (CGRA). Additionally, since the CGRA is a virtual layer on top of an FPGA, it is possible to use any off-the-shelf FPGA without the need of special tools or IP solutions. Although the mapping is NP-complete even for crossbar-based CGRAs, experimental results demonstrate a huge reduction in compilation time, as opposed to previous solutions that require seconds to map the applications, our solution requires only microseconds to find near optimal schedules. Besides the speed up, the proposed solution enables the use of just-in-time compilation, hence it is intrinsically adaptive to a changing scenario.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116096042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

on-Demand system reliability: The DeSyRe project 按需系统可靠性:DeSyRe项目

2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) Pub Date : 2013-07-15 DOI: 10.1109/SAMOS.2013.6621130

I. Sourdis

{"title":"on-Demand system reliability: The DeSyRe project","authors":"I. Sourdis","doi":"10.1109/SAMOS.2013.6621130","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621130","url":null,"abstract":"Summary form only given. The DeSyRe project builds on-demand adaptive, reliable Systems-on-Chips. In response to the current semiconductor technology trends that make chips becoming less reliable, DeSyRe describes a new generation of by design reliable systems, at a reduced power and performance cost. This is achieved through the following main contributions. DeSyRe defines a fault-tolerant system architecture built out of unreliable components, rather than aiming at totally fault-free, and hence more costly chips. In addition, DeSyRe systems are on-demand adaptive to various types and densities of faults, as well as to other system constraints and application requirements. For leveraging on-demand adaptation/customization and reliability at reduced cost, a new dynamically reconfigurable substrate is proposed and combined with runtime system software support. The above define a generic and repeatable design framework for a large variety of SoCs, which within the project - is applied to two medical SoCs with high reliability constraints and diverse performance and power requirements. In this talk, an overview of the DeSyRe and our current research findings are described.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125052840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling pipelined application with Synchronous Data Flow graphs 使用同步数据流图建模流水线应用程序

2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) Pub Date : 2013-07-15 DOI: 10.1109/SAMOS.2013.6621105

M. Lattuada, Fabrizio Ferrandi

{"title":"Modeling pipelined application with Synchronous Data Flow graphs","authors":"M. Lattuada, Fabrizio Ferrandi","doi":"10.1109/SAMOS.2013.6621105","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621105","url":null,"abstract":"Streaming applications can efficiently exploit multiprocessors architectures by means of pipelined parallelism, but designing this type of applications can be an hard task. Different subproblems have indeed to be solved: partitioning, mapping, scheduling and pipeline stage assignment. For this reason, high level abstraction models are adopted during design flow since they simplify this process by hiding most of the architectural details. Synchronous Data Flow (SDF) graphs, widely adopted to describe streaming applications, naturally model only their partitioning, so they usually have to be integrated with other types of representations. In this paper Pipelined Application Modeling (PAM), a methodology to create a Synchronous Data Flow graph describing all the aspects of a pipelined application, is presented. The methodology starts from the SDF graph describing the partitioning of the application and enriches it with new actors and channels detailing the mapping, the scheduling and the pipeline stage assignment of the considered solution. The obtained SDF graph, describing all the aspects of the solution in a formal and compact way, facilitates the evaluation of different solutions during design space exploration.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126138638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Faster unicores are still needed 仍然需要更快的独角兽

2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) Pub Date : 2013-07-15 DOI: 10.1109/SAMOS.2013.6621098

André Seznec

引用次数: 0

SIMD made explicit SIMD变得明确

2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) Pub Date : 2013-07-15 DOI: 10.1109/SAMOS.2013.6621142

Luc Waeijen, Dongrui She, H. Corporaal, Yifan He

引用次数: 11

Pulse-length determination techniques in the rectangular single event transient fault model 矩形单事件暂态故障模型中的脉冲长度确定技术

2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) Pub Date : 2013-07-15 DOI: 10.1109/SAMOS.2013.6621125

Alireza Rohani, H. Kerkhoff, Enrico Costenaro, D. Alexandrescu

引用次数: 5