{"title":"EC-Shuffle: Dynamic Erasure Coding Optimization for Efficient and Reliable Shuffle in Spark","authors":"Xin Yao, Cho-Li Wang, Mingzhe Zhang","doi":"10.1109/CCGRID.2019.00014","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00014","url":null,"abstract":"Fault-tolerance capabilities attract increasing attention from existing data processing frameworks, such as Apache Spark. To avoid replaying costly distributed computation, like shuffle, local checkpoint and remote replication are two popular approaches. They incur significant runtime overhead, such as extra storage cost or network traffic. Erasure coding is another emerging technology, which also enables data resilience. It is perceived as capable of replacing the checkpoint and replication mechanisms for its high storage efficiency. However, it suffers heavy network traffic due to distributing data partitions to different locations. In this paper, we propose EC-Shuffle with two encoding schemes and optimize the shuffle-based operations in Spark or MapReduce-like frameworks. Specifically, our encoding schemes concentrate on optimizing the data traffic during the execution of shuffle operations. They only transfer the parity chunks generated via erasure coding, instead of a whole copy of all data chunks. EC-Shuffle also provides a strategy, which can dynamically select the per-shuffle biased encoding scheme according to the number of senders and receivers in each shuffle. Our analyses indicate that this dynamic encoding selection can minimize the total size of parity chunks. The extensive experimental results using BigDataBench with hundreds of mappers and reducers shows this optimization can reduce up to 50% network traffic and achieve up to 38% performance improvement.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129306112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exhaustive Study of Hierarchical AllReduce Patterns for Large Messages Between GPUs","authors":"Yuichiro Ueno, Rio Yokota","doi":"10.1109/CCGRID.2019.00057","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00057","url":null,"abstract":"Data-parallel distributed deep learning requires an AllReduce operation between all GPUs with message sizes in the order of hundreds of megabytes. The popular implementation of AllReduce for deep learning is the Ring-AllReduce, but this method suffers from latency when using thousands of GPUs. There have been efforts to reduce this latency by combining the ring with more latency-optimal hierarchical methods. In the present work, we consider these hierarchical communication methods as a general hierarchical Ring-AllReduce with a pure Ring-AllReduce on one end and Rabenseifner's algorithm on the other end of the spectrum. We exhaustively test the various combinations of hierarchical partitioning of processes on the ABCI system in Japan on up to 2048 GPUs. We develop a performance model for this generalized hierarchical Ring-AllReduce and show the lower-bound of the effective bandwidth achievable for the hierarchical NCCL communication on thousands of GPUs. Our measurements agree well with our performance model. We also find that the optimal large-scale process hierarchy contains the optimal small-scale process hierarchy so the search space for the optimal communication will be reduced.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132430250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Preliminary Fault Taxonomy for Multi-tenant SaaS Systems","authors":"V. H. S. C. Pinto, S. Souza, P. L. D. Souza","doi":"10.1109/CCGRID.2019.00032","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00032","url":null,"abstract":"Multi-tenancy is the key feature for every Software as a Service (SaaS), as it enables multiple customers, so-called tenants, to transparently share a system's resources reducing costs. Tenants can customize a system according to their particular needs, however, such a high level of complexity may open possibilities for a failure. In addition, there is a lack of a reference architecture for such applications and once the implementations differ significantly, ensuring that all executions flows have been verified without impacting the working features for other tenants is a complex task. The clear understanding of the possible faults is fundamental for the identification, tolerance and definition of appropriate testing techniques. This paper presents a preliminary fault taxonomy for multi-tenant cloud applications considering their foundational features. A literature review previously carried out, a survey with practitioners and analysis of some applications were performed to achieve this classification. In addition, an e-commerce called MtShop was developed for a case study. The expressiveness of the proposed taxonomy is illustrated with critical faults identified in the MtShop through the automated and parallel testing. We conclude with the benefits that our taxonomy can bring to testing, prediction and regression testing activity of multi-tenant cloud applications.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130772008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Performance Improvement Approach for Second-Order Optimization in Large Mini-batch Training","authors":"Hiroki Naganuma, Rio Yokota","doi":"10.1109/CCGRID.2019.00092","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00092","url":null,"abstract":"Classical learning theory states that when the number of parameters of the model is too large compared to the data, the model will overfit and the generalization performance deteriorates. However, it has been empirically shown that deep neural networks (DNN) can achieve high generalization capability by training with extremely large amount of data and model parameters, which exceeds the predictions of classical learning theory. One drawback of this is that training of DNN requires enormous calculation time. Therefore, it is necessary to reduce the training time through large scale parallelization. Straightforward data-parallelization of DNN degrades convergence and generalization. In the present work, we investigate the possibility of using second order methods to solve this generalization gap in large-batch training. This is motivated by our observation that each mini-batch becomes more statistically stable, and thus the effect of considering the curvature plays a more important role in large-batch training. We have also found that naively adapting the natural gradient method causes the generalization performance to deteriorate further due to the lack of regularization capability. We propose an improved second order method by smoothing the loss function, which allows second-order methods to generalize as well as mini-batch SGD.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124273819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Teng Wang, S. Byna, Glenn K. Lockwood, S. Snyder, P. Carns, Sunggon Kim, N. Wright
{"title":"A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks","authors":"Teng Wang, S. Byna, Glenn K. Lockwood, S. Snyder, P. Carns, Sunggon Kim, N. Wright","doi":"10.1109/CCGRID.2019.00021","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00021","url":null,"abstract":"Scientific applications frequently spend a large fraction of their execution time in reading and writing data on parallel file systems. Identifying these I/O performance bottlenecks and attributing root causes are critical steps toward devising optimization strategies. Several existing studies analyze I/O logs of a set of benchmarks or applications that were run with controlled behaviors. However, there is still a lack of general approach that systematically identifies I/O performance bottlenecks for applications running \"in the wild\" on production systems. In this study, we have developed an analysis approach of \"zooming in\" from platform-wide to application-wide to job-level I/O logs for identifying I/O bottlenecks in arbitrary scientific applications. We analyze the logs collected on a Cray XC40 system in production over a two-month period. This study results in several insights for application developers to use in optimizing I/O behavior.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129135340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CRAM: a Container Resource Allocation Mechanism for Big Data Streaming Applications","authors":"Olubisi Runsewe, N. Samaan","doi":"10.1109/CCGRID.2019.00045","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00045","url":null,"abstract":"Containerization provides a lightweight alternative to the use of virtual machines for potentially reducing service cost and improving cloud resource utilization. A key challenge is how to allocate container resources to multiple competing streaming applications with varying QoS demands running on a heterogeneous cluster of hosts. In this paper, we focus on workload distribution for optimal resource allocation to meet the real-time demands of competing containerized big data streaming applications. We propose a container resource allocation mechanism (CRAM) based on game theory and formulate the problem as an n-player non-cooperative game among a set of heterogeneous containerized streaming applications. From our analysis, we obtain the optimal Nash Equilibrium state where no player can further improve its performance without impairing others. Experimental results demonstrate the effectiveness of our approach, which attempts to equally satisfy each containerized streaming application's request as compared to existing techniques that may treat some applications unfairly.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133371215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Enabling Dynamic Resource Estimation and Correction for Improving Utilization in an Apache Mesos Cloud Environment","authors":"Gourav Rattihalli, M. Govindaraju, Devesh Tiwari","doi":"10.1109/CCGRID.2019.00033","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00033","url":null,"abstract":"Academic cloud infrastructures require users to specify an estimate of their resource requirements. The resource usage for applications often depends on the input file sizes, parameters, optimization flags, and attributes, specified for each run. Incorrect estimation can result in low resource utilization of the entire infrastructure and long wait times for jobs in the queue. We have designed a Resource Utilization based Migration (RUMIG) system to address the resource estimation problem. We present the overall architecture of the two-stage elastic cluster design, the Apache Mesos-specific container migration system, and analyze the performance for several scientific workloads on three different cloud/cluster environments. In this paper we (b) present a design and implementation for container migration in a Mesos environment, (c) evaluate the effect of right-sizing and cluster elasticity on overall performance, (d) analyze different profiling intervals to determine the best fit, (e) determine the overhead of our profiling mechanism. Compared to the default use of Apache Mesos, in the best cases, RUMIG provides a gain of 65% in runtime (local cluster), 51% in CPU utilization in the Chameleon cloud, and 27% in memory utilization in the Jetstream cloud.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124997795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective and Efficient Big Data Management in Distributed Environments: Models, Issues, and Research Perspectives","authors":"A. Cuzzocrea","doi":"10.1109/CCGRID.2019.00071","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00071","url":null,"abstract":"This paper focuses the attention on the emerging problem of effectively and efficiently managing big data in distributed environments, by also proposing research perspectives along this line of research. Most relevant state-of-the-art approaches are also reported and discussed. Finally, the paper proposes the logical architecture of a distributed system that considers the interesting case of sensor data, a significant instance of big data.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123304349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mobile Smart-Contract Lifecycle Governance with Incentivized Proof-of-Stake for Oligopoly-Formation Prevention","authors":"Vipin Deval, A. Norta","doi":"10.1109/CCGRID.2019.00029","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00029","url":null,"abstract":"Permissionless blockchain-enabled smart contracts execute code in a distributed peer-to-peer network system and thereby overcome undesirable effects of system centralization. Smart contracts that use proof-of-stake (PoS) algorithms for the validation of transactions have advantages over proof-of-work (PoW) in that they use less electricity and perform faster. The disadvantage of PoS algorithms is the issue of nothing to stake and the emergence of staking oligopolies. Thus, significant stakeholders might be able to create an oligopoly as miners with significant stakes have the chance to validate the transaction in a dominant position. In current smart contracts, the adoption of mobile devices is another emerging trend to manage mobile smart contracts. The advantage is spreading of a democratization effect as a large number of stakers participate in transaction validation and thereby reduce the risk of oligopolies. In our work, we aim to improve the PoS algorithm to reduce oligopoly formation in smart contracts by addressing the need for creating mobile smart contracts that are governed by a mobile lifecycle management. Additionally, we enhance the scalability and performance of smart contracts by focusing specifically on ways to incentivize PoS algorithms.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128341054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Bruel, S. Masnada, B. Videau, Arnaud Legrand, J. Vincent, A. Goldman
{"title":"Autotuning Under Tight Budget Constraints: A Transparent Design of Experiments Approach","authors":"P. Bruel, S. Masnada, B. Videau, Arnaud Legrand, J. Vincent, A. Goldman","doi":"10.1109/CCGRID.2019.00026","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00026","url":null,"abstract":"A large amount of resources is spent writing, porting, and optimizing scientific and industrial High Performance Computing applications, which makes autotuning techniques fundamental to lower the cost of leveraging the improvements on execution time and power consumption provided by the latest software and hardware platforms. Despite the need for economy, most autotuning techniques still require large budgets of costly experimental measurements to provide good results, while rarely providing exploitable knowledge after optimization. The contribution of this paper is a user-transparent autotuning technique based on Design of Experiments that operates under tight budget constraints by significantly reducing the measurements needed to find good optimizations. Our approach enables users to make informed decisions on which optimizations to pursue and when to stop. We present an experimental evaluation of our approach and show it is capable of leveraging user decisions to find the best global configuration of a GPU Laplacian kernel using half of the measurement budget used by other common autotuning techniques. We show that our approach is also capable of finding speedups of up to 50x, compared to gcc's -O3, for some kernels from the SPAPT benchmark suite, using up to 10x fewer measurements than random sampling.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129173661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}