2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)最新文献

筛选
英文 中文
Deploying OpenMP on an embedded multicore accelerator 在嵌入式多核加速器上部署OpenMP
S. Agathos, V. Dimakopoulos, Aggelos Mourelis, Alexandros Papadogiannakis
{"title":"Deploying OpenMP on an embedded multicore accelerator","authors":"S. Agathos, V. Dimakopoulos, Aggelos Mourelis, Alexandros Papadogiannakis","doi":"10.1109/SAMOS.2013.6621121","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621121","url":null,"abstract":"Multiprocessor systems-on-chip (MPSoC) are now considered first-class citizens both in the embedded systems and in the high-performance computing arenas, in the form of specialized or general-purpose accelerators. Programming models for such systems is currently a hot research topic, and as a general rule require deep programmer knowledge of the underlying hardware architecture. In this paper we present the implementation of OpenMP, one of the most intuitive and productive programming models, on the STHORM accelerator. This particular platform provides a shared-memory substrate which OpenMP requires. An innovative feature of our design is the deployment of the OpenMP model both at the host and the fabric sides, in a seamless way, which provides the programmer with a simple but effective interface for offloading and executing OpenMP kernels on the MPSoC. The optimized runtime environment provides full OpenMP support despite its small footprint (less than 10KB for a 16-core cluster) and can sustain close-to-ideal speedups in computationally intensive applications. We detail on design issues we faced along with their solutions, given the limited available resources.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121666860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A scalable FFT processor architecture for OFDM based communication systems 基于OFDM通信系统的可扩展FFT处理器体系结构
Deepak Revanna, Omer Anjum, Manuele Cucchi, Roberto Airoldi, J. Nurmi
{"title":"A scalable FFT processor architecture for OFDM based communication systems","authors":"Deepak Revanna, Omer Anjum, Manuele Cucchi, Roberto Airoldi, J. Nurmi","doi":"10.1109/SAMOS.2013.6621101","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621101","url":null,"abstract":"The modern wireless standards predominantly are based on OFDM communication systems. Various mobile devices in recent times support multiple wireless standards and demand efficient transceiver. Hence, in a communication transceiver the baseband hardware needs to be scalable and efficient across multiple standards. In an OFDM based transceiver, FFT computation is one of the most computationally intensive and power hungry modules. Design of FFT hardware is a challenging task while balancing design parameters such as speed, power, area, flexibility and scalability. The research work in this paper proposes a scalable radix-2 N-point novel FFT processor architecture. The architecture design is based on an approach to balance various specified design parameters to meet the requirements of SDR platforms supporting multiple wireless standards. The FFT processor was designed and prototyped using VHDL on an Altera Stratix V FPGA device 5SGSMD5K2F40C2. The processor operates at a maximum frequency of 200MHz, uses less than 1% of FPGA device resources and meets the performance requirements of multiple wireless standards such as IEEE 802.11a/g, IEEE 802.16e, 3GPP-LTE, DAB and DVB. The proposed architecture outperforms the existing fixed and variable length FFT processors in terms of speed, flexibility and scalability.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130783397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Fast transaction-level dynamic power consumption modelling in priority preemptive wormhole switching networks on chip 片上优先抢占式虫洞交换网络的快速事务级动态功耗建模
J. Harbin, L. Indrusiak
{"title":"Fast transaction-level dynamic power consumption modelling in priority preemptive wormhole switching networks on chip","authors":"J. Harbin, L. Indrusiak","doi":"10.1109/SAMOS.2013.6621120","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621120","url":null,"abstract":"This paper specifies an architecture for power consumption modelling integrated within cycle-approximate transaction level modelling for network-on-chip (NoC) simulation. NoC simulations during design validation have traditionally been limited to very short durations, due to the necessity to perform cycle-accurate simulation to represent fully the low level system simulated. Due to the high proportion of overall system power that may be consumed by a busy NoC, high-fidelity NoC power modelling is especially important to accurately assess the effectiveness of link coding and other strategies to reduce NoC power consumption. The paper describes the extension of a cycle-approximate TLM methodology to encompass power modelling in NoCs, considering its operation with real application traffic. The proposed scheme avoids modelling of flit-by-flit progress during non-preemptive periods of packet transmission. The simulation performance and accuracy are contrasted with theoretical models and a flit-by-flit scheme (in which each flow control digit passing along a bus wire is simulated). The power consumption reduction delivered by encoding schemes such as bus-invert coding are considered and compared with analytical models to verify the correct performance of the simulation models.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131574549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Efficient runtime support for embedded MPSoCs 有效的运行时支持嵌入式mpsoc
D. Theodoropoulos, Polyvios Pratikakis, D. Pnevmatikatos
{"title":"Efficient runtime support for embedded MPSoCs","authors":"D. Theodoropoulos, Polyvios Pratikakis, D. Pnevmatikatos","doi":"10.1109/SAMOS.2013.6621119","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621119","url":null,"abstract":"Recently, many software runtime systems have been proposed that allow developers to efficiently map applications to contemporary consumer electronic devices and high-performance academic processing platforms. Most of these runtime systems employ advanced scheduling techniques for automatic task assignment to all available processing elements. However, they focus on a particular environment and architecture, and it is not easy to port them to reconfigurable embedded MPSoCs. As a consequence, in the embedded community, researchers implement hardwired application-specific task schedulers, which can not be used by other embedded MPSoCs. To address this problem, in this paper we propose a lightweight runtime software framework for reconfigurable shared-memory MPSoCs, that integrate a master embedded processor connected to slave cores. Similarly to many of the aforementioned advanced runtime systems, we adopt a task-based programming model that uses simple, pragma-based annotations of the application software, in order to dynamically resolve task dependencies. Our runtime system supports heterogeneity in the hardware resources, and is also low-overhead to account for possible limitations in their processing capabilities and available on-chip memory. To evaluate our proposal, we have prototyped an MPSoC with seven slaves to a Xilinx ML605 FPGA board. We run three micro-benchmarks that achieve a performance speedup of 3.8x, 7x and 5.8x, and energy consumption of 27%, 14% and 18% respectively, compared to a single-core baseline system with no runtime support.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130983053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A just-in-time modulo scheduling for virtual coarse-grained reconfigurable architectures 虚拟粗粒度可重构体系结构的实时模调度
R. Ferreira, Vinicius Duarte, Waldir Meireles, M. Pereira, L. Carro, Stephan Wong
{"title":"A just-in-time modulo scheduling for virtual coarse-grained reconfigurable architectures","authors":"R. Ferreira, Vinicius Duarte, Waldir Meireles, M. Pereira, L. Carro, Stephan Wong","doi":"10.1109/SAMOS.2013.6621122","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621122","url":null,"abstract":"In the past decade, most solutions concerning the mapping of the compute-intensive loop kernels to accelerators have used heuristics and compiler-based strategies. These facts require that most of the decisions be taken at design time, thus precluding efficient solutions that can take run-time information into account. Any success in accelerating such applications greatly depends on two steps, extracting the loops and mapping them into the architecture. This last step is a challenge in itself since it is a NP-complete problem. In this paper, we propose a runtime solution that can provide speed ups of 3 to 6 orders of magnitude for the mapping step when compared to the state-of-the-art at minimal performance degradation, by the combined usage of 3 distinct mechanisms: 1) a simple and efficient modulo scheduling heuristic, 2) a crossbar network, which simplifies the placement and routing, 3) a virtual coarse-grained reconfigurable architecture (CGRA). Additionally, since the CGRA is a virtual layer on top of an FPGA, it is possible to use any off-the-shelf FPGA without the need of special tools or IP solutions. Although the mapping is NP-complete even for crossbar-based CGRAs, experimental results demonstrate a huge reduction in compilation time, as opposed to previous solutions that require seconds to map the applications, our solution requires only microseconds to find near optimal schedules. Besides the speed up, the proposed solution enables the use of just-in-time compilation, hence it is intrinsically adaptive to a changing scenario.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116096042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
on-Demand system reliability: The DeSyRe project 按需系统可靠性:DeSyRe项目
I. Sourdis
{"title":"on-Demand system reliability: The DeSyRe project","authors":"I. Sourdis","doi":"10.1109/SAMOS.2013.6621130","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621130","url":null,"abstract":"Summary form only given. The DeSyRe project builds on-demand adaptive, reliable Systems-on-Chips. In response to the current semiconductor technology trends that make chips becoming less reliable, DeSyRe describes a new generation of by design reliable systems, at a reduced power and performance cost. This is achieved through the following main contributions. DeSyRe defines a fault-tolerant system architecture built out of unreliable components, rather than aiming at totally fault-free, and hence more costly chips. In addition, DeSyRe systems are on-demand adaptive to various types and densities of faults, as well as to other system constraints and application requirements. For leveraging on-demand adaptation/customization and reliability at reduced cost, a new dynamically reconfigurable substrate is proposed and combined with runtime system software support. The above define a generic and repeatable design framework for a large variety of SoCs, which within the project - is applied to two medical SoCs with high reliability constraints and diverse performance and power requirements. In this talk, an overview of the DeSyRe and our current research findings are described.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125052840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling pipelined application with Synchronous Data Flow graphs 使用同步数据流图建模流水线应用程序
M. Lattuada, Fabrizio Ferrandi
{"title":"Modeling pipelined application with Synchronous Data Flow graphs","authors":"M. Lattuada, Fabrizio Ferrandi","doi":"10.1109/SAMOS.2013.6621105","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621105","url":null,"abstract":"Streaming applications can efficiently exploit multiprocessors architectures by means of pipelined parallelism, but designing this type of applications can be an hard task. Different subproblems have indeed to be solved: partitioning, mapping, scheduling and pipeline stage assignment. For this reason, high level abstraction models are adopted during design flow since they simplify this process by hiding most of the architectural details. Synchronous Data Flow (SDF) graphs, widely adopted to describe streaming applications, naturally model only their partitioning, so they usually have to be integrated with other types of representations. In this paper Pipelined Application Modeling (PAM), a methodology to create a Synchronous Data Flow graph describing all the aspects of a pipelined application, is presented. The methodology starts from the SDF graph describing the partitioning of the application and enriches it with new actors and channels detailing the mapping, the scheduling and the pipeline stage assignment of the considered solution. The obtained SDF graph, describing all the aspects of the solution in a formal and compact way, facilitates the evaluation of different solutions during design space exploration.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126138638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Faster unicores are still needed 仍然需要更快的独角兽
André Seznec
{"title":"Faster unicores are still needed","authors":"André Seznec","doi":"10.1109/SAMOS.2013.6621098","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621098","url":null,"abstract":"Summary form only given. For the last decade, the advent of the multicore era has driven most of the research architecture effort from the high end processor industry as well as from the computer architecture research community. However, many of the workloads executed on our servers, PCs, smartphones, tablets are still inherently sequential or multiprogrammed; even parallel applications require high sequential performance. Therefore, there are new opportunities for research in uniprocessor microarchitecture. In this presentation, I will first present the motivations of the ERC grant DAL Defying Amdahl's Law. I will then present two research actions on processor core architecture that are on-going within the DAL framework in the INRIA/IRISA ALF group: revisiting value prediction and out-of-order execution of predicated instruction sets.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122281859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SIMD made explicit SIMD变得明确
Luc Waeijen, Dongrui She, H. Corporaal, Yifan He
{"title":"SIMD made explicit","authors":"Luc Waeijen, Dongrui She, H. Corporaal, Yifan He","doi":"10.1109/SAMOS.2013.6621142","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621142","url":null,"abstract":"Low energy consumption has become one of the most important topics in computing. With single CPUs consuming as much as 115 Watt, engineers have been looking for ways to reduce energy consumption while maintaining high computational performance. Often wide SIMD architectures are used to achieve this, exploiting data parallelism to keep the required clock frequency low for a given compute constraint. In this paper, we propose a wide SIMD architecture with explicit datapath to further optimize energy efficiency without sacrificing computation power. To have a detailed comparison, both the proposed wide SIMD architecture and its transparent bypassing counterpart are implemented in HDL and synthesized with a TSMC 40nm low power library. The power estimation is derived from actual toggle rates generated by post-synthesis simulation. Our experimental results show that with explicit bypassing the overall energy consumption can be reduced up to 44% compared to the corresponding SIMD architecture with transparent bypassing.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132391799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Pulse-length determination techniques in the rectangular single event transient fault model 矩形单事件暂态故障模型中的脉冲长度确定技术
Alireza Rohani, H. Kerkhoff, Enrico Costenaro, D. Alexandrescu
{"title":"Pulse-length determination techniques in the rectangular single event transient fault model","authors":"Alireza Rohani, H. Kerkhoff, Enrico Costenaro, D. Alexandrescu","doi":"10.1109/SAMOS.2013.6621125","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621125","url":null,"abstract":"One of the well-known models to represent Single Event Transient phenomenon at the logic-level is the rectangular pulse model. However, the pulse-length in this model has a vital contribution to the accuracy and validity of the rectangular pulse model. The work presented in this paper develops two approaches for determination of the pulse-length of the rectangular pulse model used in Single Event Transient (SET) faults. The first determination approach has been extracted from radiation testing along with transistor-level SET analysis tools. The second determination approach has been elicited from asymptotic analytical behaviour of SETs in 45-nm CMOS process. The results show that applying these two pulse-length determination approaches to the rectangular pulse model will cause the fault injection results converge much faster (up to sixteen times), compared to other conventional approaches.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131088923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信