2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)最新文献_第7页

Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms 混合平台上基于任务的运行时内存行为的视觉性能分析

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI: 10.1109/CCGRID.2019.00025

Lucas Leandro Nesi, Samuel Thibault, Luka Stanisic, L. Schnorr

{"title":"Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms","authors":"Lucas Leandro Nesi, Samuel Thibault, Luka Stanisic, L. Schnorr","doi":"10.1109/CCGRID.2019.00025","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00025","url":null,"abstract":"Programming parallel applications for heterogeneous HPC platforms is much more straightforward when using the task-based programming paradigm. The simplicity exists because a runtime takes care of many activities usually carried out by the application developer, such as task mapping, load balancing, and memory management operations. In this paper, we present a visualization-based performance analysis methodology to investigate the CPU-GPU-Disk memory management of the StarPU runtime, a popular task-based middleware for HPC applications. We detail the design of novel graphical strategies that were fundamental to recognize performance problems in four study cases. We first identify poor management of data handles when GPU memory is saturated, leading to low application performance. Our experiments using the dense tiled-based Cholesky factorization show that our fix leads to performance gains of 66% and better scalability for larger input sizes. In the other three cases, we study scenarios where the main memory is insufficient to store all the application's data, forcing the runtime to store data out-of-core. Using our methodology, we pin-point different behavior among schedulers and how we have identified a crucial problem in the application code regarding initial block placement, which leads to poor performance.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116341664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

One Can Only Gain by Replacing EASY Backfilling: A Simple Scheduling Policies Case Study 一个人只能通过替换简单的回填获得:一个简单的调度策略案例研究

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI: 10.1109/CCGRID.2019.00010

Danilo Carastan-Santos, R. Camargo, D. Trystram, Salah Zrigui

{"title":"One Can Only Gain by Replacing EASY Backfilling: A Simple Scheduling Policies Case Study","authors":"Danilo Carastan-Santos, R. Camargo, D. Trystram, Salah Zrigui","doi":"10.1109/CCGRID.2019.00010","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00010","url":null,"abstract":"High-Performance Computing (HPC) platforms are growing in size and complexity. In order to improve the quality of service of such platforms, researchers are devoting a great amount of effort to devise algorithms and techniques to improve different aspects of performance such as energy consumption, total usage of the platform, and fairness between users. In spite of this, system administrators are always reluctant to deploy state of the art scheduling methods and most of them revert to EASY-backfilling, also known as EASY-FCFS (EASY-First-Come-First-Served). Newer methods frequently are complex and obscure and the simplicity and transparency of EASY are too important to sacrifice. In this work, we used execution logs from five HPC platforms to compare four simple scheduling policies: FCFS, Shortest estimated Processing time First (SPF), Smallest Requested Resources First (SQF), and Smallest estimated Area First (SAF). Using simulations, we performed a thorough analysis of the cumulative results for up to 180 weeks and considered three scheduling objectives: waiting time, slowdown and per-processor slowdown. We also evaluated other effects, such as the relationship between job size and slowdown, the distribution of slowdown values, and the number of backfilled jobs, for each HPC platform and scheduling policy. We conclude that one can only gain by replacing EASY-backfilling with SAF with backfilling, as it offers improvements in performance by up to 80% in the slowdown metric while maintaining the simplicity and the transparency of FCFS. Moreover, SAF reduces the number of jobs with large slowdowns and the inclusion of a simple thresholding mechanism guarantees that no starvation occurs. Finally, we propose SAF as a new benchmark for future scheduling studies.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116796828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

[Copyright notice] (版权)

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI: 10.1109/ccgrid.2019.00003

引用次数: 0

Enabling Large Scale Data Production for OpenDose with GATE on the EGI Infrastructure 在EGI基础设施上使用GATE实现OpenDose的大规模数据生产

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI: 10.1109/CCGRID.2019.00084

M. Chauvin, Gilles Mathieu, S. Camarasu-Pop, Axel Bonnet, M. Bardiès, I. Perseil

{"title":"Enabling Large Scale Data Production for OpenDose with GATE on the EGI Infrastructure","authors":"M. Chauvin, Gilles Mathieu, S. Camarasu-Pop, Axel Bonnet, M. Bardiès, I. Perseil","doi":"10.1109/CCGRID.2019.00084","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00084","url":null,"abstract":"The OpenDose collaboration has been established to generate an open and traceable reference database of dosimetric data for nuclear medicine, using a variety of Monte Carlo codes. The amount of data to generate requires to run tens of thousands of simulations per anthropomorphic model, for a total computation time estimated to millions of CPU hours. To tackle this challenge, a project has been initiated to enable large scale data production with the Monte Carlo code GATE. Within this project, CRCT, Inserm CISI and CREATIS worked on developing solutions to run Gate simulations on the EGI grid infrastructure using existing tools such as VIP and GateLab. Those developments include a new GATE grid application deployed on VIP, modifications to the existing GateLab application, and the development of a client code using a REST API for using both. Developed tools have already allowed running 30% of GATE simulations for the first 2 models (adult male and adult female). On-going and future work includes improvements both to code and submission strategies, documentation and packaging of the code, definition and implementation of a long-term storage strategy, extension to other models, and generalisation of the tools to the other Monte Carlo codes used within the OpenDose collaboration.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130730999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Performance Driven Micro Services-Based Architecture/System for Analyzing Noisy IoT Data 基于性能驱动的微服务架构/系统分析物联网噪声数据

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI: 10.1109/CCGRID.2019.00031

M. Bolic, S. Majumdar

{"title":"A Performance Driven Micro Services-Based Architecture/System for Analyzing Noisy IoT Data","authors":"M. Bolic, S. Majumdar","doi":"10.1109/CCGRID.2019.00031","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00031","url":null,"abstract":"The Internet of Things (IoT) technology presents a complex and challenging paradigm where a huge amount of noisy raw sensor data is collected in order to observe and detect critical events occurring on the system, and generate alarms when required. The biggest challenge of the IoT systems is that the systems collect a massive amount of uncertain data from diverse IoT devices connected through the network. In addition, some events are inferred from other events and uncertainty is propagated from parent events to the inferred events, which additionally contributes to overall system uncertainty. The observed complex events are a complex relationship of primitive events that are produced by IoT devices and collected in IoT systems. A survey performed on existing prior arts on quantifying uncertainty for complex events concluded that proposed existing solutions are unable to scale under heavy loads of incoming data. This paper presents a micro-service based notification methodology that uses complex event recognition (both complex event processing and probabilistic programming) to handle IoT systems uncertainty. In addition, the paper analyzes and recommends existing big data platforms for processing complex events in IoT systems. The current focus of our work includes research and development of the optimized deadline-based and cost-effective resource allocation algorithm in Apache Spark for Uncertain IoT Notification systems.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":" 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113952839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Scalable Video Transcoding in Public Clouds 公共云中可扩展的视频转码

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI: 10.1109/CCGRID.2019.00017

Qingye Jiang, Young Choon Lee, Albert Y. Zomaya

引用次数: 5

Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks 分布式深度卷积神经网络的理论可扩展性分析

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI: 10.1109/CCGRID.2019.00068

Adrián Castelló, M. F. Dolz, E. S. Quintana‐Ortí, J. Duato

引用次数: 12

Scalability of the NewMadeleine Communication Library for Large Numbers of MPI Point-to-Point Requests 针对大量MPI点对点请求的NewMadeleine通信库的可扩展性

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI: 10.1109/CCGRID.2019.00051

Alexandre Denis

引用次数: 6

Application-Level Differential Checkpointing for HPC Applications with Dynamic Datasets 具有动态数据集的HPC应用程序的应用级差分检查点

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI: 10.1109/CCGRID.2019.00015

Kai Keller, L. Bautista-Gomez

{"title":"Application-Level Differential Checkpointing for HPC Applications with Dynamic Datasets","authors":"Kai Keller, L. Bautista-Gomez","doi":"10.1109/CCGRID.2019.00015","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00015","url":null,"abstract":"High-performance computing (HPC) requires resilience techniques such as checkpointing in order to tolerate failures in supercomputers. As the number of nodes and memory in supercomputers keeps on increasing, the size of checkpoint data also increases dramatically, sometimes causing an I/O bottleneck. Differential checkpointing (dCP) aims to minimize the checkpointing overhead by only writing data differences. This is typically implemented at the memory page level, sometimes complemented with hashing algorithms. However, such a technique is unable to cope with dynamic-size datasets. In this work, we present a novel dCP implementation with a new file format that allows fragmentation of protected datasets in order to support dynamic sizes. We identify dirty data blocks using hash algorithms. In order to evaluate the dCP performance, we ported the HPC applications xPic, LULESH 2.0 and Heat2D and analyze them regarding their potential of reducing I/O with dCP and how this data reduction influences the checkpoint performance. In our experiments, we achieve reductions of up to 62% of the checkpoint time.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133529663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Welcome from the General Chairs 欢迎各位主席

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI: 10.1109/ccgrid.2019.00005

引用次数: 0