{"title":"The UA?CG Workflow: High Performance Molecular Dynamics of Coarse-Grained Polymers","authors":"David Ozog, A. Malony, M. Guenza","doi":"10.1109/PDP.2016.127","DOIUrl":"https://doi.org/10.1109/PDP.2016.127","url":null,"abstract":"Our analytically based technique for coarse-graining (CG) polymer simulations dramatically improves spatial and temporal scaling while preserving thermodynamic quantities and bulk properties. The purpose of CG codes is to run more efficient molecular dynamics simulations, yet the research field generally lacks thorough analysis of how such codes scale with respect to full-atom representations. This paper conducts an in-depth performance study of highly realistic polymer melts on modern supercomputing systems. We also present a workflow that integrates our analytical solution for calculating CG forces with new high-performance techniques for mapping back and forth between the atomistic and CG descriptions in LAMMPS. The workflow benefits from the performance of CG, while maintaining full-atom accuracy. Our results show speedups up to 12x faster than atomistic simulations.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129902210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vicent Selfa, Crispín Gómez Requena, M. E. Gómez, J. Sahuquillo
{"title":"A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors","authors":"Vicent Selfa, Crispín Gómez Requena, M. E. Gómez, J. Sahuquillo","doi":"10.1109/PDP.2016.47","DOIUrl":"https://doi.org/10.1109/PDP.2016.47","url":null,"abstract":"Prefetching significantly reduces the memory latencies of a wide range of applications and thus increases the system performance. However, as a speculative technique, prefetching may also noticeably increase the number of memory accesses, which in turns may negatively impact on the main memory bandwidth consumption, performance, and power. Main memory bandwidth consumption is a critical resource especially in the context of current multicore processors since memory requests from all the cores, both prefetch and demand requests, compete among them in the access to the DRAM banks. Consequently, demand requests may be delayed hurting the system performance. This work proposes the Activation/Deactivation Policies (ADP) scheme for hardware prefetchers in multicore processors. This scheme relies on activation policies that turn on the prefetcher on a given core when it is expected that prefetches will improve the performance, and turn off the prefetcher of that core when it is foreseen that performance will be scarcely improved or not improved at all. The proposed mechanism effectively reduces the memory bandwidth requirements of some cores with respect to a typical always prefetching mechanism, so making available extra bandwidth to the co-runners. Results in a four-core processor show that ADP prefetching achieves similar performance ±2.5% as always prefetching, while significantly reducing the memory bandwidth consumed by use-less prefetches. Moreover, in some applications this reduction is as much as 50%. ADP prefetching is applicable to stream-based prefetchers, global-history-buffer delta correlation prefetchers, and PC-based stride prefetchers.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"132 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130890667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy Aware Scheduling of HPC Tasks in Decentralised Cloud Systems","authors":"Aeshah Alsughayyir, T. Erlebach","doi":"10.1109/PDP.2016.83","DOIUrl":"https://doi.org/10.1109/PDP.2016.83","url":null,"abstract":"The increased computational needs in many sectors place huge demands on cloud computing. Power consumption and resource pool capacity are two of the challenges faced by the next generation of high performance computing (HPC). This paper aims at minimising the computing-energy consumption in decentralised multi-cloud systems using Dynamic Voltage and Frequency Scaling (DVFS) when scheduling dependent HPC tasks under deadline constraints. We propose an energy-aware scheduling algorithm EAGS. To demonstrate the efficiency of our algorithm EAGS, we compare it with the Cloud min-min Scheduling (CMMS) algorithm in different experiments. The simulation results show that our algorithm can produce energy consumption lower than CMMS by an average of 63.9%.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130967390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Danelutto, José Daniel García Sánchez, Luis Miguel Sánchez, Rafael Sotomayor, M. Torquati
{"title":"Introducing Parallelism by Using REPARA C++11 Attributes","authors":"M. Danelutto, José Daniel García Sánchez, Luis Miguel Sánchez, Rafael Sotomayor, M. Torquati","doi":"10.1109/PDP.2016.115","DOIUrl":"https://doi.org/10.1109/PDP.2016.115","url":null,"abstract":"Patterns provide a mechanism to express parallelism at a high level of abstraction and to make easier the transformation of existing legacy applications to target parallel frameworks. That also opens a path for writing new parallel applications. In this paper we introduce the REPARA approach for expressing parallel patterns and transforming the source code to parallelism frameworks. We take advantage of C++11 attributes as a mechanism to introduce annotations and enrich semantic information on valid source code. We also present a methodology for performing transformation of source code that allows to target multiple parallel programming models. Another contribution is a rule based mechanism to transform annotated code to those specific programming models. The REPARA approach requires programmer intervention only to perform initial code annotation while providing speedups that are comparable to those obtained by manual parallelization.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132586928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computing Multiple Accumulated Cost Surfaces with Graphics Processing Units","authors":"G. Trunfio, G. Sirakoulis","doi":"10.1109/PDP.2016.76","DOIUrl":"https://doi.org/10.1109/PDP.2016.76","url":null,"abstract":"Accumulated cost surfaces (ACSs) are a tool for spatial modelling used in a number of fields. Some relevant applications, especially in the areas of multi-criteria evaluation and spatial optimization, require the availability of several ACSs on the same raster, which may result in a significant computational cost. In this paper, we discuss some techniques available in the literature for accelerating the ACS computation using graphics processing units (GPUs) and CUDA. Also, we illustrate in details a new CUDA algorithm suitable for the computation of multiple ACSs. Moreover, we present some preliminary results on a test case, including an experimental comparison against a fast sequential implementation running on a CPU.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134229695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An OpenACC Optimizer for Accelerating Histogram Computation on a GPU","authors":"Kei Ikeda, Fumihiko Ino, K. Hagihara","doi":"10.1109/PDP.2016.14","DOIUrl":"https://doi.org/10.1109/PDP.2016.14","url":null,"abstract":"This paper presents a source-to-source OpenACC optimizer that automatically optimizes a histogram computation code for a graphics processing unit (GPU). Parallel histogram computation codes typically deploy multiple copies of histograms and update them with atomic operations. This duplication method can be implemented as an OpenACC code. However, the structure of sequential code blocks must be manually rewritten owing to the limitation on OpenACC directives. Such a rewritten code does not always achieve the highest performance on arbitrary platforms, and thus, the duplication method degrades the performance portability of the code. To tackle this issue, we propose an optimizer that identifies histogram-related blocks in a naive OpenACC code and automatically rewrites the detected blocks such that multiple copies of histograms can be exploited for acceleration. In experiments, we apply our optimizer to three practical applications and investigate their performance on three platforms: an NVIDIA GPU, an AMD GPU and an Intel CPU. Experimental results show that our automated approach is useful for OpenACC codes to maximize the performance of histogram computation, and thereby enhancing the performance portability of the code.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"15 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130228966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gulay Yalcin, S. Rethinagiri, Oscar Palomar, O. Unsal, A. Cristal, D. Milojevic
{"title":"Exploring Energy Reduction in Future Technology Nodes via Voltage Scaling with Application to 10nm","authors":"Gulay Yalcin, S. Rethinagiri, Oscar Palomar, O. Unsal, A. Cristal, D. Milojevic","doi":"10.1109/PDP.2016.108","DOIUrl":"https://doi.org/10.1109/PDP.2016.108","url":null,"abstract":"Voltage and frequency downscaling is a well-known scheme in order to reduce the energy consumption of a computer system. However, the quantity of the saved energy first depends on the utilized technology node. Also, when the voltage level is below the safe margin, instructions need to be re-executed due to voltage related faults which can present additional energy overheads thus nullifying the expected energy gains from the lower voltage. Moreover, both fault recovery and frequency reduction impacts the performance of the application. In this study, we first evaluate the error rate of several sub-circuits (i.e. functional units) at the n10 future technology node. In order to reduce the performance impact, we reduce the voltage and frequency of each sub-circuit at a fine granularity while we keep the frequency of the rest of the system in the nominal voltage level. In this way, in an out-of-order architecture instruction level parallelism can mask the performance impact of a relatively slow functional unit. According to our evaluations, the energy consumption of functional units can be reduced up to 92% with only 8% performance degradation.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126748505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Desnitsky, A. Chechulin, Igor Kotenko, D. Levshun, Maxim Kolomeec
{"title":"Application of a Technique for Secure Embedded Device Design Based on Combining Security Components for Creation of a Perimeter Protection System","authors":"V. Desnitsky, A. Chechulin, Igor Kotenko, D. Levshun, Maxim Kolomeec","doi":"10.1109/PDP.2016.99","DOIUrl":"https://doi.org/10.1109/PDP.2016.99","url":null,"abstract":"From information security point of view embedded devices are the elements of complex systems operating in a potentially hostile environment. Therefore development of embedded devices is a complex task that often requires expert solutions. The complexity of the task of developing secure embedded devices is caused by various types of threats and attacks that may affect the device, as well as that in practice security of embedded devices is usually considered at the final stage of the development process in the form of adding additional security features. The paper proposes a design technique and its application that will facilitate development of secure and energy-efficient embedded devices. The technique organizes the search for the best combinations of security components on the basis of solving an optimization problem. The efficiency of the proposed technique is demonstrated by development of a room perimeter protection system.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"48 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126072357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adriano Vogel, Dalvan Griebler, Carlos A. F. Maron, C. Schepke, L. G. Fernandes
{"title":"Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack","authors":"Adriano Vogel, Dalvan Griebler, Carlos A. F. Maron, C. Schepke, L. G. Fernandes","doi":"10.1109/PDP.2016.75","DOIUrl":"https://doi.org/10.1109/PDP.2016.75","url":null,"abstract":"Despite the evolution of cloud computing in recent years, the performance and comprehensive understanding of the available private cloud tools are still under research. This paper contributes to an analysis of the Infrastructure as a Service (IaaS) domain by mapping new insights and discussing the challenges for improving cloud services. The goal is to make a comparative analysis of OpenNebula, OpenStack and CloudStack tools, evaluating their differences on support for flexibility and resiliency. Also, we aim at evaluating these three cloud tools when they are deployed using a mutual hypervisor (KVM) for discovering new empirical insights. Our research results demonstrated that OpenStack is the most resilient and CloudStack is the most flexible for deploying an IaaS private cloud. Moreover, the performance experiments indicated some contrasts among the private IaaS cloud instances when running intensive workloads and scientific applications.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115343440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hardware Scheduler for Multicore Block Cipher Processor","authors":"Sang Muk Lee, E. Ko, Seung Eun Lee","doi":"10.1109/PDP.2016.59","DOIUrl":"https://doi.org/10.1109/PDP.2016.59","url":null,"abstract":"In consequence of an increasing demand for high-performance system, multiprocessor architectures became trend and used in a variety of fields (e.g. PC, laptops, mobile devices and so on). Multi-core processor can get outstanding throughput with relatively lower operating frequency and power consumption. In order to obtain the maximum throughput in a multi-core structure, it is necessary to schedule assigning tasks to multiple cores. In this paper, we propose a hardware scheduler that is tailored for multicore block cipher and verify the feasibility of the scheduler using AES algorithm.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123906619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}