2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)最新文献

筛选
英文 中文
Energy-aware-task-parallelism for efficient dynamic voltage, and frequency scaling, in CGRAs CGRAs中高效动态电压和频率缩放的能量感知任务并行性
Syed M. A. H. Jafri, Muhammad Adeel Tajammul, A. Hemani, K. Paul, J. Plosila, H. Tenhunen
{"title":"Energy-aware-task-parallelism for efficient dynamic voltage, and frequency scaling, in CGRAs","authors":"Syed M. A. H. Jafri, Muhammad Adeel Tajammul, A. Hemani, K. Paul, J. Plosila, H. Tenhunen","doi":"10.1109/SAMOS.2013.6621112","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621112","url":null,"abstract":"Today, coarse grained reconfigurable architectures (CGRAs) host multiple applications, with arbitrary communication and computation patterns. Each application itself is composed of multiple tasks, spatially mapped to different parts of platform. Providing worst-case operating point to all applications leads to excessive energy and power consumption. To cater this problem, dynamic voltage and frequency scaling (DVFS) is a frequently used technique. DVFS allows to scale the voltage and/or frequency of the device, based on runtime constraints. Recent research suggests that the efficiency of DVFS can be significantly enhanced by combining dynamic parallelism with DVFS. The proposed methods exploit the speedup induced by parallelism to allow aggressive frequency and voltage scaling. These techniques, employ greedy algorithm, that blindly parallelizes a task whenever required resources are available. Therefore, it is likely to parallelize a task(s) even if it offers no speedup to the application, thereby undermining the effectiveness of parallelism. As a solution to this problem, we present energy aware task parallelism. Our solution relies on a resource allocation graphs and an autonomous parallelism, voltage, and frequency selection algorithm. Using resource allocation graph, as a guide, the autonomous parallelism, voltage, and frequency selection algorithm parallelizes a task only if its parallel version reduces overall application execution time. Simulation results, using representative applications (MPEG4, WLAN), show that our solution promises better resource utilization, compared to greedy algorithm. Synthesis results (using WLAN) confirm a significant reduction in energy (up to 36%), power (up to 28%), and configuration memory requirements (up to 36%), compared to state of the art.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116538894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
ECONO: Express coherence notifications for efficient cache coherency in many-core CMPs 在多核cmp中表达一致性通知以实现高效的缓存一致性
José L. Abellán, Alberto Ros, Juan Fernández Peinador, M. Acacio
{"title":"ECONO: Express coherence notifications for efficient cache coherency in many-core CMPs","authors":"José L. Abellán, Alberto Ros, Juan Fernández Peinador, M. Acacio","doi":"10.1109/SAMOS.2013.6621128","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621128","url":null,"abstract":"It is commonly stated that a directory-based coherence protocol is the design of choice to provide maximum performance in coherence maintenance for shared-memory many-core CMPs. Nevertheless, new solutions are emerging to achieve acceptable levels of on-chip area overhead and energy consumption to also meet scalability. In this work, we propose the Express COherence NOtification (ECONO) protocol, a coherence protocol aimed at providing high performance with minimal on-chip area and energy consumption for superior scalability. To maintain coherence, ECONO relies on express coherence notifications which are broadcast atomically over a dedicated lightweight and power-efficient on-chip network leveraging state-of-the-art technology. We implement and evaluate ECONO utilizing full-system simulation, a representative set of benchmarks, and compare it against two contemporary coherence protocols: Hammer and Directory. While ECONO achieves slightly better performance than Directory, our proposal does not need to encode sharer sets like in Hammer, saving significant on-chip area and energy even when considering the extra hardware resources required by ECONO. Projections for a 1024-core CMP reveal that, in comparison to one of the most scalable directory-based protocols to date, ECONO entails more than 2× less on-chip storage overhead while keeping with reasonable power dissipation.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116547934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
General purpose computing on low-power embedded GPUs: Has it come of age? 基于低功耗嵌入式gpu的通用计算:它已经成熟了吗?
Arian Maghazeh, Unmesh D. Bordoloi, P. Eles, Zebo Peng
{"title":"General purpose computing on low-power embedded GPUs: Has it come of age?","authors":"Arian Maghazeh, Unmesh D. Bordoloi, P. Eles, Zebo Peng","doi":"10.1109/SAMOS.2013.6621099","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621099","url":null,"abstract":"In this paper we evaluate the promise held by low-power GPUs for non-graphic workloads that arise in embedded systems. Towards this, we map and implement 5 benchmarks, that find utility in very different application domains, to an embedded GPU. Our results show that apart from accelerated performance, embedded GPUs are promising also because of their energy efficiency which is an important design goal for battery-driven mobile devices. We show that adopting the same optimization strategies as those used for programming high-end GPUs might lead to worse performance on embedded GPUs. This is due to restricted features of embedded GPUs, such as, limited or no user-defined memory, small instruction-set, limited number of registers, among others. We propose techniques to overcome such challenges, e.g., by distributing the workload between GPUs and multi-core CPUs, similar to the spirit of heterogeneous computation.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124875931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Design of a unified transport triggered processor for LDPC/turbo decoder LDPC/turbo解码器统一传输触发处理器的设计
S. Shahabuddin, Janne Janhunen, Muhammet Fatih Bayramoglu, M. Juntti, Amanullah Ghazi, O. Silvén
{"title":"Design of a unified transport triggered processor for LDPC/turbo decoder","authors":"S. Shahabuddin, Janne Janhunen, Muhammet Fatih Bayramoglu, M. Juntti, Amanullah Ghazi, O. Silvén","doi":"10.1109/SAMOS.2013.6621137","DOIUrl":"https://doi.org/10.1109/SAMOS.2013.6621137","url":null,"abstract":"This paper summarizes the design of a programmable processor with transport triggered architecture (TTA) for decoding LDPC and turbo codes. The processor architecture is designed in such a manner that it can be programmed for LDPC or turbo decoding for the purpose of internetworking and roaming between different networks. The standard trellis based maximum a posteriori (MAP) algorithm is used for turbo decoding. Unlike most other implementations, a supercode based sum-product algorithm is used for the check node message computation for LDPC decoding. This approach ensures the highest hardware utilization of the processor architecture for the two different algorithms. Up to our knowledge, this is the first attempt to design a TTA processor for the LDPC decoder. The processor is programmed with a high level language to meet the time-to-market requirement. The optimization techniques and the usage of the function units for both algorithms are explained in detail. The processor achieves 22.64 Mbps throughput for turbo decoding with a single iteration and 10.12 Mbps throughput for LDPC decoding with five iterations for a clock frequency of 200 MHz.","PeriodicalId":382307,"journal":{"name":"2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131340253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信