2019 International Conference on High Performance Computing & Simulation (HPCS)最新文献_第10页

Configuring Graph Traversal Applications for GPUs: Analysis of Implementation Strategies and their Correlation with Graph Characteristics 配置图形处理器的图形遍历应用:实现策略及其与图形特性的相关性分析

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188204

F. Busato, N. Bombieri

{"title":"Configuring Graph Traversal Applications for GPUs: Analysis of Implementation Strategies and their Correlation with Graph Characteristics","authors":"F. Busato, N. Bombieri","doi":"10.1109/HPCS48598.2019.9188204","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188204","url":null,"abstract":"Implementing a graph traversal (GT) algorithm for GPUs is a very challenging task. It is a core primitive for many graph analysis applications and its efficiency strongly impacts on the overall application performance. Different strategies have been proposed to implement the GT algorithm by exploiting the GPU characteristics. Nevertheless, the efficiency of each of them strongly depends on the graph characteristics. This paper presents an analysis of the most important features of the parallel GT algorithm, which include frontier queue management, load balancing, duplicate removing, and synchronization during graph traversal iterations. It shows different techniques to implement each of such features for GPUs and the comparison of their performance when applied on a very large and heterogeneous set of graphs. The results allow identifying, for each feature and among different implementation techniques of them, the best configuration to address the graph characteristics. The paper finally presents how such a configuration analysis and set allow traversing graphs with throughput up to 14,000 MTEPS on single GPU devices, with speedups ranging from 1.2x to 18.5x with regard to the best parallel applications for GT on GPUs at the state of the art.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"73 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126108310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PCS: A Productive Computational Science Platform PCS:一个多产的计算科学平台

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188108

David Ojika, A. Gordon-Ross, H. Lam, Shinjae Yoo, Younggang Cui, Zhihua Dong, K. V. Dam, Seyong Lee, T. Kurth

{"title":"PCS: A Productive Computational Science Platform","authors":"David Ojika, A. Gordon-Ross, H. Lam, Shinjae Yoo, Younggang Cui, Zhihua Dong, K. V. Dam, Seyong Lee, T. Kurth","doi":"10.1109/HPCS48598.2019.9188108","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188108","url":null,"abstract":"As modern supercomputers continue to be increasingly heterogeneous with diverse computational accelerators (graphics processing units (GPUs), fieldprogrammable gate arrays (FPGAs), application specific integrated circuits (ASICs), etc.), software becomes a critical design aspect. Exploiting this new computational power requires increased software design time and effort to make valuable scientific discovery in the face of the complicated programming environments introduced by these accelerators. To address these challenges, we propose unifying multiple programming models into a single programming environment to facilitate large-scale, accelerator-aware, heterogeneous computing for next-generation scientific applications. This paper presents PCS, a productive computational science platform for cluster-scale heterogeneous computing. Focusing FPGAs, we describe the key concepts of the PCS platform and differentiate PCS from the current state-of-the-art, propose a new multi-FPGA architecture for graph-centric workloads (e.g., deep learning, etc.) with discussions on ongoing work.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127123818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Enhancing Reliability of Compute Environments on Amazon EC2 Spot Instances 增强Amazon EC2 Spot实例上计算环境的可靠性

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188116

Altino M. Sampaio, Jorge G. Barbosa

{"title":"Enhancing Reliability of Compute Environments on Amazon EC2 Spot Instances","authors":"Altino M. Sampaio, Jorge G. Barbosa","doi":"10.1109/HPCS48598.2019.9188116","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188116","url":null,"abstract":"Amazon Elastic Compute Cloud (EC2) gives access to resources in the form of virtual servers, also known as instances. EC2 Spot Instances (SIs) offer spare compute capacity at steep discounts compared to reliable and fixed price on-demand instances. The drawback, however, is that waiting time until requested spots become fulfilled can be incredible high. In this paper, we propose a container migration-based solution to enhance the reliability of virtual cluster computing environments built on top of non-reserved EC2 pricing model instances. We compare the performance of our algorithm by executing different resource provisioning plans for running real-life workflow applications, constrained by user-defined deadline and budget Quality of Service (QoS) parameters. The results show that our solution is able to successfully conclude almost 98% of workflow applications and more than 99% of workflow tasks for on-demand- and spot block-based virtual compute environments. For SI-based virtual compute environments, our solution achieves similar results, completing more than 98% of workflow applications, and over 99% of workflow tasks, for a worse-case scenario.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127533251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multi-Parameter Performance Modeling using Symbolic Regression 基于符号回归的多参数性能建模

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188202

Sai P. Chenna, G. Stitt, H. Lam

{"title":"Multi-Parameter Performance Modeling using Symbolic Regression","authors":"Sai P. Chenna, G. Stitt, H. Lam","doi":"10.1109/HPCS48598.2019.9188202","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188202","url":null,"abstract":"Performance modeling is becoming critically important due to the need for design-space exploration on emerging exascale architectures. Existing modeling and prediction approaches are either restricted by a limited number of parameters, or provide extreme tradeoffs between simulation performance and modeling accuracy that are not ideal for exascale simulations. At one extreme are low-level discrete-event simulators, which provide high accuracy, but are prohibitively slow for large-scale simulations. At the opposite extreme are abstract modeling approaches that are sufficiently fast, but tend to support a limited number of parameters, while also lacking accuracy due to machine-specific behaviors that deviate from anticipated models. In this paper, we improve upon existing abstract modeling approaches by leveraging symbolic regression to automatically discover an underlying multi-parameter model of the system and application that captures difficult-to-understand behaviors. For three High Performance Computing (HPC) applications running on Vulcan, we show that symbolic regression provided modeling accuracies that were $3.5 times, 4.6 times$, and $6.2 times$ better than analytical models developed using linear regression.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127290840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Feedback-Based Resource Allocation for Batch Scheduling of Scientific Workflows 基于反馈的科学工作流批量调度资源分配

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188055

Carl Witt, Dennis Wagner, U. Leser

{"title":"Feedback-Based Resource Allocation for Batch Scheduling of Scientific Workflows","authors":"Carl Witt, Dennis Wagner, U. Leser","doi":"10.1109/HPCS48598.2019.9188055","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188055","url":null,"abstract":"A scientific workflow is a set of interdependent compute tasks orchestrating large scale data analyses or in-silico experiments. Workflows often comprise thousands of tasks with heterogeneous resource requirements that need to be executed on distributed resources. Many workflow engines solve parallelization by submitting tasks to a batch scheduling system, which requires resource usage estimates that have to be provided by users. We investigate the possibility to improve upon inaccurate user estimates by incorporating an online feedback loop between workflow scheduling, resource usage prediction, and measurement.Our approach can learn resource usage of arbitrary type; in this paper, we demonstrate its effectiveness by predicting peak memory usage of tasks, as it is an especially sensitive resource type that leads to task termination if underestimated and leads to decreased throughput if overestimated.We compare online versions of standard machine learning models for peak memory usage prediction and analyze their interactions with different workflow scheduling strategies. By means of extensive simulation experiments, we found that the proposed feedback mechanism improves resource utilization and execution times compared to typical user estimates.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125253992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Performance Prediction for Power-Capped Applications based on Machine Learning Algorithms 基于机器学习算法的功率限制应用性能预测

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188144

Bo Wang, Jannis Klinkenberg, D. Ellsworth, C. Terboven, Matthias S. Müller

引用次数: 0

Energy performances of a routing protocol based on fuzzy logic approach in an underwater wireless sensor networks 基于模糊逻辑的水下无线传感器网络路由协议能量性能研究

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188061

Hajar Bennouri, A. Berqia

引用次数: 1

Cost Reduction Bounds of Proactive Management Based on Request Prediction

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188199

R. Milocco, P. Minet, É. Renault, S. Boumerdassi

{"title":"Cost Reduction Bounds of Proactive Management Based on Request Prediction","authors":"R. Milocco, P. Minet, É. Renault, S. Boumerdassi","doi":"10.1109/HPCS48598.2019.9188199","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188199","url":null,"abstract":"Data Centers (DCs) need to manage their servers periodically to meet user demand efficiently. Since the cost of the energy employed to serve the user demand is lower when DC settings (e.g. number of active servers) are done a priori (proactively), there is a great interest in studying different proactive strategies based on predictions of requests. The amount of savings in energy cost that can be achieved depends not only on the selected proactive strategy but also on the statistics of the demand and the predictors used. Despite its importance, due to the complexity of the problem it is difficult to find studies that quantity the savings that can be obtained. The main contribution of this paper is to propose a generic methodology to quantity the possible cost reduction using proactive management based on predictions. Thus, using this method together with past data it is possible to quantity the efficiency of different predictors as well as optimize proactive strategies. In this paper, the cost reduction is evaluated using both ARMA (Auto Regressive Moving Average) and LV (Last Value) predictors. We then apply this methodology to the Google dataset collected over a period of 29 days to evaluate the benefit that can be obtained with those two predictors in the considered DC.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127661621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Approximating Memory-bound Applications on Mobile GPUs 在移动gpu上近似内存绑定应用程序

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188051

Daniel Maier, Nadjib Mammeri, Biagio Cosenza, B. Juurlink

{"title":"Approximating Memory-bound Applications on Mobile GPUs","authors":"Daniel Maier, Nadjib Mammeri, Biagio Cosenza, B. Juurlink","doi":"10.1109/HPCS48598.2019.9188051","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188051","url":null,"abstract":"Approximate computing techniques are often used to improve the performance of applications that can tolerate some amount of impurity in the calculations or data. In the context of embedded and mobile systems, a broad number of applications have exploited approximation techniques to improve performance and overcome the limited capabilities of the hardware. On such systems, even small performance improvements can be sufficient to meet scheduled requirements such as hard real-time deadlines. We study the approximation of memory-bound applications on mobile GPUs using kernel perforation, an approximation technique that exploits the availability of fast GPU local memory to provide high performance with more accurate results. Using this approximation technique, we approximated six applications and evaluated them on two mobile GPU architectures with very different memory layouts: a Qualcomm Adreno 506 and an ARM Mali T860 MP2. Results show that, even when the local memory is not mapped to dedicated fast memory in hardware, kernel perforation is still capable of $1.25times$ speedup because of improved memory layout and caching effects. Mobile GPUs with local memory show a speedup of up to $1.38times$.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"151 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132904254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Thin-Threads: An Approach for History-Based Monte Carlo on GPUs 细线程:gpu上基于历史的蒙特卡罗方法

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI: 10.1109/HPCS48598.2019.9188080

R. Bleile, P. Brantley, D. Richards, S. Dawson, M. S. McKinley, M. O’Brien, H. Childs

引用次数: 1