2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)最新文献

筛选
英文 中文
The UA?CG Workflow: High Performance Molecular Dynamics of Coarse-Grained Polymers UA ?CG工作流程:粗粒聚合物的高性能分子动力学
David Ozog, A. Malony, M. Guenza
{"title":"The UA?CG Workflow: High Performance Molecular Dynamics of Coarse-Grained Polymers","authors":"David Ozog, A. Malony, M. Guenza","doi":"10.1109/PDP.2016.127","DOIUrl":"https://doi.org/10.1109/PDP.2016.127","url":null,"abstract":"Our analytically based technique for coarse-graining (CG) polymer simulations dramatically improves spatial and temporal scaling while preserving thermodynamic quantities and bulk properties. The purpose of CG codes is to run more efficient molecular dynamics simulations, yet the research field generally lacks thorough analysis of how such codes scale with respect to full-atom representations. This paper conducts an in-depth performance study of highly realistic polymer melts on modern supercomputing systems. We also present a workflow that integrates our analytical solution for calculating CG forces with new high-performance techniques for mapping back and forth between the atomistic and CG descriptions in LAMMPS. The workflow benefits from the performance of CG, while maintaining full-atom accuracy. Our results show speedups up to 12x faster than atomistic simulations.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129902210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors 一个简单的芯片多处理器激活/停用预取方案
Vicent Selfa, Crispín Gómez Requena, M. E. Gómez, J. Sahuquillo
{"title":"A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors","authors":"Vicent Selfa, Crispín Gómez Requena, M. E. Gómez, J. Sahuquillo","doi":"10.1109/PDP.2016.47","DOIUrl":"https://doi.org/10.1109/PDP.2016.47","url":null,"abstract":"Prefetching significantly reduces the memory latencies of a wide range of applications and thus increases the system performance. However, as a speculative technique, prefetching may also noticeably increase the number of memory accesses, which in turns may negatively impact on the main memory bandwidth consumption, performance, and power. Main memory bandwidth consumption is a critical resource especially in the context of current multicore processors since memory requests from all the cores, both prefetch and demand requests, compete among them in the access to the DRAM banks. Consequently, demand requests may be delayed hurting the system performance. This work proposes the Activation/Deactivation Policies (ADP) scheme for hardware prefetchers in multicore processors. This scheme relies on activation policies that turn on the prefetcher on a given core when it is expected that prefetches will improve the performance, and turn off the prefetcher of that core when it is foreseen that performance will be scarcely improved or not improved at all. The proposed mechanism effectively reduces the memory bandwidth requirements of some cores with respect to a typical always prefetching mechanism, so making available extra bandwidth to the co-runners. Results in a four-core processor show that ADP prefetching achieves similar performance ±2.5% as always prefetching, while significantly reducing the memory bandwidth consumed by use-less prefetches. Moreover, in some applications this reduction is as much as 50%. ADP prefetching is applicable to stream-based prefetchers, global-history-buffer delta correlation prefetchers, and PC-based stride prefetchers.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"132 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130890667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Energy Aware Scheduling of HPC Tasks in Decentralised Cloud Systems 分布式云系统中高性能计算任务的能量感知调度
Aeshah Alsughayyir, T. Erlebach
{"title":"Energy Aware Scheduling of HPC Tasks in Decentralised Cloud Systems","authors":"Aeshah Alsughayyir, T. Erlebach","doi":"10.1109/PDP.2016.83","DOIUrl":"https://doi.org/10.1109/PDP.2016.83","url":null,"abstract":"The increased computational needs in many sectors place huge demands on cloud computing. Power consumption and resource pool capacity are two of the challenges faced by the next generation of high performance computing (HPC). This paper aims at minimising the computing-energy consumption in decentralised multi-cloud systems using Dynamic Voltage and Frequency Scaling (DVFS) when scheduling dependent HPC tasks under deadline constraints. We propose an energy-aware scheduling algorithm EAGS. To demonstrate the efficiency of our algorithm EAGS, we compare it with the Cloud min-min Scheduling (CMMS) algorithm in different experiments. The simulation results show that our algorithm can produce energy consumption lower than CMMS by an average of 63.9%.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130967390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Introducing Parallelism by Using REPARA C++11 Attributes 使用REPARA c++ 11属性引入并行性
M. Danelutto, José Daniel García Sánchez, Luis Miguel Sánchez, Rafael Sotomayor, M. Torquati
{"title":"Introducing Parallelism by Using REPARA C++11 Attributes","authors":"M. Danelutto, José Daniel García Sánchez, Luis Miguel Sánchez, Rafael Sotomayor, M. Torquati","doi":"10.1109/PDP.2016.115","DOIUrl":"https://doi.org/10.1109/PDP.2016.115","url":null,"abstract":"Patterns provide a mechanism to express parallelism at a high level of abstraction and to make easier the transformation of existing legacy applications to target parallel frameworks. That also opens a path for writing new parallel applications. In this paper we introduce the REPARA approach for expressing parallel patterns and transforming the source code to parallelism frameworks. We take advantage of C++11 attributes as a mechanism to introduce annotations and enrich semantic information on valid source code. We also present a methodology for performing transformation of source code that allows to target multiple parallel programming models. Another contribution is a rule based mechanism to transform annotated code to those specific programming models. The REPARA approach requires programmer intervention only to perform initial code annotation while providing speedups that are comparable to those obtained by manual parallelization.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132586928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Computing Multiple Accumulated Cost Surfaces with Graphics Processing Units 用图形处理单元计算多个累积成本曲面
G. Trunfio, G. Sirakoulis
{"title":"Computing Multiple Accumulated Cost Surfaces with Graphics Processing Units","authors":"G. Trunfio, G. Sirakoulis","doi":"10.1109/PDP.2016.76","DOIUrl":"https://doi.org/10.1109/PDP.2016.76","url":null,"abstract":"Accumulated cost surfaces (ACSs) are a tool for spatial modelling used in a number of fields. Some relevant applications, especially in the areas of multi-criteria evaluation and spatial optimization, require the availability of several ACSs on the same raster, which may result in a significant computational cost. In this paper, we discuss some techniques available in the literature for accelerating the ACS computation using graphics processing units (GPUs) and CUDA. Also, we illustrate in details a new CUDA algorithm suitable for the computation of multiple ACSs. Moreover, we present some preliminary results on a test case, including an experimental comparison against a fast sequential implementation running on a CPU.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134229695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An OpenACC Optimizer for Accelerating Histogram Computation on a GPU 在GPU上加速直方图计算的OpenACC优化器
Kei Ikeda, Fumihiko Ino, K. Hagihara
{"title":"An OpenACC Optimizer for Accelerating Histogram Computation on a GPU","authors":"Kei Ikeda, Fumihiko Ino, K. Hagihara","doi":"10.1109/PDP.2016.14","DOIUrl":"https://doi.org/10.1109/PDP.2016.14","url":null,"abstract":"This paper presents a source-to-source OpenACC optimizer that automatically optimizes a histogram computation code for a graphics processing unit (GPU). Parallel histogram computation codes typically deploy multiple copies of histograms and update them with atomic operations. This duplication method can be implemented as an OpenACC code. However, the structure of sequential code blocks must be manually rewritten owing to the limitation on OpenACC directives. Such a rewritten code does not always achieve the highest performance on arbitrary platforms, and thus, the duplication method degrades the performance portability of the code. To tackle this issue, we propose an optimizer that identifies histogram-related blocks in a naive OpenACC code and automatically rewrites the detected blocks such that multiple copies of histograms can be exploited for acceleration. In experiments, we apply our optimizer to three practical applications and investigate their performance on three platforms: an NVIDIA GPU, an AMD GPU and an Intel CPU. Experimental results show that our automated approach is useful for OpenACC codes to maximize the performance of histogram computation, and thereby enhancing the performance portability of the code.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"15 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130228966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exploring Energy Reduction in Future Technology Nodes via Voltage Scaling with Application to 10nm 探索未来技术节点的能量降低,通过电压缩放应用到10nm
Gulay Yalcin, S. Rethinagiri, Oscar Palomar, O. Unsal, A. Cristal, D. Milojevic
{"title":"Exploring Energy Reduction in Future Technology Nodes via Voltage Scaling with Application to 10nm","authors":"Gulay Yalcin, S. Rethinagiri, Oscar Palomar, O. Unsal, A. Cristal, D. Milojevic","doi":"10.1109/PDP.2016.108","DOIUrl":"https://doi.org/10.1109/PDP.2016.108","url":null,"abstract":"Voltage and frequency downscaling is a well-known scheme in order to reduce the energy consumption of a computer system. However, the quantity of the saved energy first depends on the utilized technology node. Also, when the voltage level is below the safe margin, instructions need to be re-executed due to voltage related faults which can present additional energy overheads thus nullifying the expected energy gains from the lower voltage. Moreover, both fault recovery and frequency reduction impacts the performance of the application. In this study, we first evaluate the error rate of several sub-circuits (i.e. functional units) at the n10 future technology node. In order to reduce the performance impact, we reduce the voltage and frequency of each sub-circuit at a fine granularity while we keep the frequency of the rest of the system in the nominal voltage level. In this way, in an out-of-order architecture instruction level parallelism can mask the performance impact of a relatively slow functional unit. According to our evaluations, the energy consumption of functional units can be reduced up to 92% with only 8% performance degradation.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126748505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Application of a Technique for Secure Embedded Device Design Based on Combining Security Components for Creation of a Perimeter Protection System 基于安全组件组合的安全嵌入式设备设计技术在周界防护系统中的应用
V. Desnitsky, A. Chechulin, Igor Kotenko, D. Levshun, Maxim Kolomeec
{"title":"Application of a Technique for Secure Embedded Device Design Based on Combining Security Components for Creation of a Perimeter Protection System","authors":"V. Desnitsky, A. Chechulin, Igor Kotenko, D. Levshun, Maxim Kolomeec","doi":"10.1109/PDP.2016.99","DOIUrl":"https://doi.org/10.1109/PDP.2016.99","url":null,"abstract":"From information security point of view embedded devices are the elements of complex systems operating in a potentially hostile environment. Therefore development of embedded devices is a complex task that often requires expert solutions. The complexity of the task of developing secure embedded devices is caused by various types of threats and attacks that may affect the device, as well as that in practice security of embedded devices is usually considered at the final stage of the development process in the form of adding additional security features. The paper proposes a design technique and its application that will facilitate development of secure and energy-efficient embedded devices. The technique organizes the search for the best combinations of security components on the basis of solving an optimization problem. The efficiency of the proposed technique is demonstrated by development of a room perimeter protection system.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"48 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126072357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack 私有IaaS云:OpenNebula、CloudStack和OpenStack的比较分析
Adriano Vogel, Dalvan Griebler, Carlos A. F. Maron, C. Schepke, L. G. Fernandes
{"title":"Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack","authors":"Adriano Vogel, Dalvan Griebler, Carlos A. F. Maron, C. Schepke, L. G. Fernandes","doi":"10.1109/PDP.2016.75","DOIUrl":"https://doi.org/10.1109/PDP.2016.75","url":null,"abstract":"Despite the evolution of cloud computing in recent years, the performance and comprehensive understanding of the available private cloud tools are still under research. This paper contributes to an analysis of the Infrastructure as a Service (IaaS) domain by mapping new insights and discussing the challenges for improving cloud services. The goal is to make a comparative analysis of OpenNebula, OpenStack and CloudStack tools, evaluating their differences on support for flexibility and resiliency. Also, we aim at evaluating these three cloud tools when they are deployed using a mutual hypervisor (KVM) for discovering new empirical insights. Our research results demonstrated that OpenStack is the most resilient and CloudStack is the most flexible for deploying an IaaS private cloud. Moreover, the performance experiments indicated some contrasts among the private IaaS cloud instances when running intensive workloads and scientific applications.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115343440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
A Hardware Scheduler for Multicore Block Cipher Processor 多核分组密码处理器的硬件调度程序
Sang Muk Lee, E. Ko, Seung Eun Lee
{"title":"A Hardware Scheduler for Multicore Block Cipher Processor","authors":"Sang Muk Lee, E. Ko, Seung Eun Lee","doi":"10.1109/PDP.2016.59","DOIUrl":"https://doi.org/10.1109/PDP.2016.59","url":null,"abstract":"In consequence of an increasing demand for high-performance system, multiprocessor architectures became trend and used in a variety of fields (e.g. PC, laptops, mobile devices and so on). Multi-core processor can get outstanding throughput with relatively lower operating frequency and power consumption. In order to obtain the maximum throughput in a multi-core structure, it is necessary to schedule assigning tasks to multiple cores. In this paper, we propose a hardware scheduler that is tailored for multicore block cipher and verify the feasibility of the scheduler using AES algorithm.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123906619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信