2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)最新文献_第10页

The UA?CG Workflow: High Performance Molecular Dynamics of Coarse-Grained Polymers UA ?CG工作流程:粗粒聚合物的高性能分子动力学

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-02-01 DOI: 10.1109/PDP.2016.127

David Ozog, A. Malony, M. Guenza

引用次数: 2

A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors 一个简单的芯片多处理器激活/停用预取方案

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-02-01 DOI: 10.1109/PDP.2016.47

Vicent Selfa, Crispín Gómez Requena, M. E. Gómez, J. Sahuquillo

{"title":"A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors","authors":"Vicent Selfa, Crispín Gómez Requena, M. E. Gómez, J. Sahuquillo","doi":"10.1109/PDP.2016.47","DOIUrl":"https://doi.org/10.1109/PDP.2016.47","url":null,"abstract":"Prefetching significantly reduces the memory latencies of a wide range of applications and thus increases the system performance. However, as a speculative technique, prefetching may also noticeably increase the number of memory accesses, which in turns may negatively impact on the main memory bandwidth consumption, performance, and power. Main memory bandwidth consumption is a critical resource especially in the context of current multicore processors since memory requests from all the cores, both prefetch and demand requests, compete among them in the access to the DRAM banks. Consequently, demand requests may be delayed hurting the system performance. This work proposes the Activation/Deactivation Policies (ADP) scheme for hardware prefetchers in multicore processors. This scheme relies on activation policies that turn on the prefetcher on a given core when it is expected that prefetches will improve the performance, and turn off the prefetcher of that core when it is foreseen that performance will be scarcely improved or not improved at all. The proposed mechanism effectively reduces the memory bandwidth requirements of some cores with respect to a typical always prefetching mechanism, so making available extra bandwidth to the co-runners. Results in a four-core processor show that ADP prefetching achieves similar performance ±2.5% as always prefetching, while significantly reducing the memory bandwidth consumed by use-less prefetches. Moreover, in some applications this reduction is as much as 50%. ADP prefetching is applicable to stream-based prefetchers, global-history-buffer delta correlation prefetchers, and PC-based stride prefetchers.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"132 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130890667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Energy Aware Scheduling of HPC Tasks in Decentralised Cloud Systems 分布式云系统中高性能计算任务的能量感知调度

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-02-01 DOI: 10.1109/PDP.2016.83

Aeshah Alsughayyir, T. Erlebach

引用次数: 10

Introducing Parallelism by Using REPARA C++11 Attributes 使用REPARA c++ 11属性引入并行性

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-02-01 DOI: 10.1109/PDP.2016.115

M. Danelutto, José Daniel García Sánchez, Luis Miguel Sánchez, Rafael Sotomayor, M. Torquati

引用次数: 20

Computing Multiple Accumulated Cost Surfaces with Graphics Processing Units 用图形处理单元计算多个累积成本曲面

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-02-01 DOI: 10.1109/PDP.2016.76

G. Trunfio, G. Sirakoulis

引用次数: 6

An OpenACC Optimizer for Accelerating Histogram Computation on a GPU 在GPU上加速直方图计算的OpenACC优化器

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-02-01 DOI: 10.1109/PDP.2016.14

Kei Ikeda, Fumihiko Ino, K. Hagihara

{"title":"An OpenACC Optimizer for Accelerating Histogram Computation on a GPU","authors":"Kei Ikeda, Fumihiko Ino, K. Hagihara","doi":"10.1109/PDP.2016.14","DOIUrl":"https://doi.org/10.1109/PDP.2016.14","url":null,"abstract":"This paper presents a source-to-source OpenACC optimizer that automatically optimizes a histogram computation code for a graphics processing unit (GPU). Parallel histogram computation codes typically deploy multiple copies of histograms and update them with atomic operations. This duplication method can be implemented as an OpenACC code. However, the structure of sequential code blocks must be manually rewritten owing to the limitation on OpenACC directives. Such a rewritten code does not always achieve the highest performance on arbitrary platforms, and thus, the duplication method degrades the performance portability of the code. To tackle this issue, we propose an optimizer that identifies histogram-related blocks in a naive OpenACC code and automatically rewrites the detected blocks such that multiple copies of histograms can be exploited for acceleration. In experiments, we apply our optimizer to three practical applications and investigate their performance on three platforms: an NVIDIA GPU, an AMD GPU and an Intel CPU. Experimental results show that our automated approach is useful for OpenACC codes to maximize the performance of histogram computation, and thereby enhancing the performance portability of the code.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"15 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130228966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Exploring Energy Reduction in Future Technology Nodes via Voltage Scaling with Application to 10nm 探索未来技术节点的能量降低，通过电压缩放应用到10nm

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-02-01 DOI: 10.1109/PDP.2016.108

Gulay Yalcin, S. Rethinagiri, Oscar Palomar, O. Unsal, A. Cristal, D. Milojevic

{"title":"Exploring Energy Reduction in Future Technology Nodes via Voltage Scaling with Application to 10nm","authors":"Gulay Yalcin, S. Rethinagiri, Oscar Palomar, O. Unsal, A. Cristal, D. Milojevic","doi":"10.1109/PDP.2016.108","DOIUrl":"https://doi.org/10.1109/PDP.2016.108","url":null,"abstract":"Voltage and frequency downscaling is a well-known scheme in order to reduce the energy consumption of a computer system. However, the quantity of the saved energy first depends on the utilized technology node. Also, when the voltage level is below the safe margin, instructions need to be re-executed due to voltage related faults which can present additional energy overheads thus nullifying the expected energy gains from the lower voltage. Moreover, both fault recovery and frequency reduction impacts the performance of the application. In this study, we first evaluate the error rate of several sub-circuits (i.e. functional units) at the n10 future technology node. In order to reduce the performance impact, we reduce the voltage and frequency of each sub-circuit at a fine granularity while we keep the frequency of the rest of the system in the nominal voltage level. In this way, in an out-of-order architecture instruction level parallelism can mask the performance impact of a relatively slow functional unit. According to our evaluations, the energy consumption of functional units can be reduced up to 92% with only 8% performance degradation.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126748505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Application of a Technique for Secure Embedded Device Design Based on Combining Security Components for Creation of a Perimeter Protection System 基于安全组件组合的安全嵌入式设备设计技术在周界防护系统中的应用

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-02-01 DOI: 10.1109/PDP.2016.99

V. Desnitsky, A. Chechulin, Igor Kotenko, D. Levshun, Maxim Kolomeec

引用次数: 6

Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack 私有IaaS云:OpenNebula、CloudStack和OpenStack的比较分析

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-02-01 DOI: 10.1109/PDP.2016.75

Adriano Vogel, Dalvan Griebler, Carlos A. F. Maron, C. Schepke, L. G. Fernandes

引用次数: 39

A Hardware Scheduler for Multicore Block Cipher Processor 多核分组密码处理器的硬件调度程序

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-02-01 DOI: 10.1109/PDP.2016.59

Sang Muk Lee, E. Ko, Seung Eun Lee

引用次数: 4