2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)最新文献

筛选
英文 中文
Multi-objective Container Deployment on Heterogeneous Clusters 异构集群上的多目标容器部署
Yang Hu, C. D. Laat, Zhiming Zhao
{"title":"Multi-objective Container Deployment on Heterogeneous Clusters","authors":"Yang Hu, C. D. Laat, Zhiming Zhao","doi":"10.1109/CCGRID.2019.00076","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00076","url":null,"abstract":"Operating system (OS) containers are becoming increasingly popular in cloud computing for improving productivity and code portability. However, existing deployment scheduling solutions mainly treat each container deployment as an independent request, and focus on the single aspect of resource utilization or load balancing, or work on homogeneous clusters. In this paper, we propose a new container deployment algorithm to satisfy multiple objectives on heterogeneous clusters. We analyze the deployment requirements of container-based infrastructure and formulate the deployment problem as a vector bin packing problem with heterogeneous bins. We focus on three objectives: multi-resource guarantee, load balancing, and dependency awareness. The goal of the proposed algorithm is to improve the tradeoff between load balancing and dependency awareness with multi-resource guarantees. Based on the algorithm, we implement a prototype scheduler to deploy containers on heterogeneous clusters. We evaluate our scheduler over a wide range of workload scenarios by simulation, which shows that our scheduler significantly outperforms existing schedulers of the container orchestration platforms.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130590419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning 一种基于异构集群系统分组的分布式深度学习随机梯度下降算法
Wenbin Jiang, Geyan Ye, L. Yang, Jian Zhu, Yang Ma, Xia Xie, Hai Jin
{"title":"A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning","authors":"Wenbin Jiang, Geyan Ye, L. Yang, Jian Zhu, Yang Ma, Xia Xie, Hai Jin","doi":"10.1109/CCGRID.2019.00053","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00053","url":null,"abstract":"On heterogeneous cluster systems, the convergence performances of neural network models are greatly troubled by the different performances of machines. In this paper, we propose a novel distributed Stochastic Gradient Descent (SGD) algorithm named Grouping-SGD for distributed deep learning, which converges faster than Sync-SGD, Async-SGD, and Stale-SGD. In Grouping-SGD, machines are partitioned into multiple groups, ensuring that machines in the same group have similar performances. Machines in the same group update the models synchronously, while different groups update the models asynchronously. To improve the performance of Grouping-SGD further, the parameter servers are arranged from fast to slow, and they are responsible for updating the model parameters from the lower layer to the higher layer respectively. The experimental results indicate that Grouping-SGD can achieve 1.2-3.7 times speedups using popular image classification benchmarks: MNIST, Cifar10, Cifar100, and ImageNet, compared to Sync-SGD, Async-SGD, and Stale-SGD.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114269661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Machine Learning Approach for Productive Data Locality Exploitation in Parallel Computing Systems 并行计算系统中生产数据局部性利用的机器学习方法
Engin Kayraklioglu, Erwan Favry, T. El-Ghazawi
{"title":"A Machine Learning Approach for Productive Data Locality Exploitation in Parallel Computing Systems","authors":"Engin Kayraklioglu, Erwan Favry, T. El-Ghazawi","doi":"10.1109/CCGRID.2019.00050","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00050","url":null,"abstract":"Data locality is of extreme importance in programming distributed-memory architectures due to its implications on latency and energy consumption. Automated compiler and runtime system optimization studies have attempted to improve data locality exploitation without burdening the programmer. However, due to the difficulty of static code analysis, conservatism in compiler optimizations to avoid errors, and cost of dynamic analysis, the efficacy of automated optimizations is limited. Therefore, programmers need to spend significant effort in optimizing locality. In this work, we present an automated code optimization framework that trains neural networks using application profiles for small data sizes that exhibit similar patterns to larger cases. The application is then modified to use the neural network to improve data locality exploitation. We prototype our framework for the Chapel language and integrate with the language stack. We experimentally demonstrate that our framework can learn access patterns and create optimized executables in minutes. The resulting executables perform more than one order of magnitude faster than unoptimized code, and are comparable to manual locality optimization without burdening the programmer and hindering productivity.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123411615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
On Cost-Driven Computation Offloading in the Edge: A New Model Approach 成本驱动的边缘计算卸载:一种新的模型方法
Mingzhe Du, Yang Wang, Chengzhong Xu
{"title":"On Cost-Driven Computation Offloading in the Edge: A New Model Approach","authors":"Mingzhe Du, Yang Wang, Chengzhong Xu","doi":"10.1109/CCGRID.2019.00063","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00063","url":null,"abstract":"Computation offloading is an often-used optimization method that exploits servers with powerful and plentiful resources to maximize computation efficiency with minimum cost. In this method, a client application is usually modeled as a weighted directed acyclic graph (DAG), which is typically split into two distinct parts – one running on client device and the other on server machine. To simplify the model, the inter-part communication costs are always assumed to be symmetric and the intra-part communication costs are commonly ignored. Although these assumptions are reasonable to the offloading in traditional mobile computing, they are not valid anymore when considering the problem in the edgecloud environment, especially with the development of microservice, where a provisioned multi-machine cluster at each side is involved. To address this problem, we propose a new offloading model in this paper, where both the intra-part communication costs as well as the asymmetry of inter-part communication costs are incorporated to carry out the client application, which are not a part of previous approaches. Given this model, we first prove the offloading problem is NP-hard, then design an efficient greedy algorithm to obtain a sub-optimal solution. Our numerical results show that our algorithm for the new model is always efficient to find a better offloading scheme, compared with other existing algorithms that lack the notion of communication costs between tasks co-located at the same side and the asymmetry of communication costs crossing sides.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115973574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures 现代体系结构中共享地址空间MPI群的设计与表征
J. Hashmi, S. Chakraborty, Mohammadreza Bayatpour, H. Subramoni, D. Panda
{"title":"Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures","authors":"J. Hashmi, S. Chakraborty, Mohammadreza Bayatpour, H. Subramoni, D. Panda","doi":"10.1109/CCGRID.2019.00055","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00055","url":null,"abstract":"Emerging multi-/many-cores such as Intel Xeon and Xeon Phi are widely being adopted for modern large-scale supercomputing systems. The architectural features such as high core density, mesh interconnects, deeper memory hierarchies and hardware multi-threading offered by these systems provide opportunities for application developers to exploit more parallelism. However, it also poses significant challenges for the MPI runtimes to optimize communication performance. One of the major challenges involves optimizing collective communication for dense multi-/many-core processors. Traditionally, MPI runtimes have used send/recv, direct shared-memory (\"double-copy\") or kernel-assisted (\"single-copy\") mechanisms for intra-node collective communication. However, existing collective designs that are based on these mechanisms suffer from several bottlenecks such as multiple copies, per message handshake, and kernel-level lock contention that limit their performance. In this paper, we first characterize the bottlenecks associated with the aforementioned approaches in designing collectives in MPI. Then, we propose efficient \"Shared-address space\"-based designs to implement different MPI collectives. Finally, we show the efficacy of our approach by implementing various MPI collectives. Our proposed designs show up to 11x, 50x, 17x, and 5x performance improvement for Bcast, Scatter, Gather, and Alltoall over other state-of-the-art MPI libraries on different multi-/many-core architectures.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124758106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Efficient Job Scheduling for Clusters with Shared Tiered Storage 基于共享分级存储集群的高效作业调度
Leah E. Lackner, H. M. Fard, F. Wolf
{"title":"Efficient Job Scheduling for Clusters with Shared Tiered Storage","authors":"Leah E. Lackner, H. M. Fard, F. Wolf","doi":"10.1109/CCGRID.2019.00046","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00046","url":null,"abstract":"New fast storage technologies such as non-volatile memory are becoming ubiquitous in HPC systems with one or two orders of magnitude higher I/O bandwidth than traditional back-end storage systems. They can be used to heavily speed-up I/O operations, an essential prerequisite for data-intensive exascale computing capabilities. However, since the overall capacity of the fast storage available in a system is limited, an individual job may not always benefit if access to fast storage implies longer waiting time in the queue. This is obvious if fast storage is shared across the system. We therefore argue that the decision of whether or not to use fast storage should be supported by the batch scheduler, which can estimate when the amount of fast storage a job desires will become available. We present a scheduling algorithm with this functionality and show in simulations significantly reduced makespan and turnaround times in comparison to always using fast storage, always using slow back-end storage, and random storage assignment.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123408825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Distributed Range Query Processing in Apache Spark Apache Spark中高效的分布式范围查询处理
A. Papadopoulos, S. Sioutas, C. Zaroliagis, Nikolaos Zacharatos
{"title":"Efficient Distributed Range Query Processing in Apache Spark","authors":"A. Papadopoulos, S. Sioutas, C. Zaroliagis, Nikolaos Zacharatos","doi":"10.1109/CCGRID.2019.00073","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00073","url":null,"abstract":"Range queries are important in many diverse applications. In its simplest one-dimensional form, a range query is expressed by an interval [a, b] on the real line, whereas the answer consists of all elements ε in [a, b]. In this work, we focus on efficient range query processing techniques in the Apache Spark engine, which is the state-of-the-art solution for big data management and analytics. We aim at developing a Spark-based indexing scheme that supports range queries in such large-scale decentralized environments and scale well w.r.t. the number of nodes and the data items stored. Towards this goal, there have been solutions in the last few years, which however turn out to be inadequate at the envisaged scale, since the classic linear or even the logarithmic complexity (for point queries) is still too expensive, whereas range query processing is even more demanding. In this paper, we go one step further and present a solution with sub-logarithmic complexity. In particular, we present SPIS (SPark-based Interpolation Search), a tree structure that outperforms the existing Spark built-in lookup techniques. We carry out an experimental evaluation by using synthetic data sets. Our experimental results demonstrate the efficiency and scalability of the proposed approach.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124615003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards Efficient Solvers for Optimisation Problems 面向优化问题的高效求解器
H. Vo
{"title":"Towards Efficient Solvers for Optimisation Problems","authors":"H. Vo","doi":"10.1109/CCGRID.2019.00030","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00030","url":null,"abstract":"Constraint programming (CP) is pervasive and widely used to solve real-time problems which input data could be scaled up to the huge sizes, and the results are required to be given efficiently and dynamically. Many technologies such as CP, hybrid technologies, mixed integer programming (MIP), constraint-based local search (CBLS), boolean satisfiability (SAT) could have different solvers and backends to solve the real-time problems. Streaming videos problem is the problem that requires to decide which videos to put in which cache servers in order to minimise the waiting time for all requests with a description of cache servers, network endpoints and videos are given. In this paper, we model the streaming videos problem in two different ways. The first model is implemented using heuristics, and the global constraints are used in the second model. The experiments are benchmarked using MiniZinc, which is an open-source constraint modelling language that can be used to model constraint satisfaction and optimisation problems in the high-level, solver-independent way. The aim of the paper is to benchmark these technologies to evaluate the execution time and final scores of the two models using large instances of input data from Google Hash Code.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131963374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Congestion Management for High-Speed Interconnects using Adaptive Routing 基于自适应路由的高速互连有效拥塞管理
José Rocher-González, J. Escudero-Sahuquillo, P. García, F. Quiles, Gaspar Mora
{"title":"Efficient Congestion Management for High-Speed Interconnects using Adaptive Routing","authors":"José Rocher-González, J. Escudero-Sahuquillo, P. García, F. Quiles, Gaspar Mora","doi":"10.1109/CCGRID.2019.00036","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00036","url":null,"abstract":"The interconnection network is the central element in high-performance computing (HPC) clusters and Datacenters, where thousands of end nodes must communicate in a fast and reliable manner. The network performance depends on several design choices, such as the topology, the routing algorithm, the switch architecture, etc. Highly efficient routing algorithms, either deterministic or adaptive, have been proposed to smartly balance traffic flows in cost-effective network topologies, but their performance is reduced in scenarios where congestion and their negative effects (e.g. the HoL blocking) appear. In particular, in scenarios where congestion is intense and persistent, the HoL blocking may degrade dramatically the performance of adaptive routing algorithms, since they may spread congested traffic flows through all the available routes. In addition, as we have shown in previous studies, this spreading of congested flows may spoil the performance of the static queuing schemes that are used to reduce HoL blocking by separating flows into different queues at switch buffers. Indeed, as these schemes are based on a static criterion defined prior to the traffic injection in the network, they are unable to avoid that congested and non-congested flows share queues when paired with adaptive routing. In this paper, we propose to use some existing static queuing schemes and dynamic allocation of virtual channels (VCs) to isolate into a single VC the flows whose routes have been adaptively routed, in order to prevent the impact of the congestion spreading through several routes. Basically, adapted flows are moved to a special adapted-flow channel (AFC), so that they do not interact with flows mapped to other VCs by the static queuing scheme. In this way, the HoL blocking that adaptively routed flows could cause to non-adaptive flows is prevented, even if congested flows have been spread through several routes. On the other hand, the static queuing scheme will reduce without any interference the HoL blocking that may appear among non-adaptive flows. To evaluate our proposal we have conducted extensive simulation experiments modeling large interconnection networks based on the fat-tree topology. From the obtained results, we can conclude that our approach efficiently and significantly reduces the HoL blocking impact in interconnection networks using adaptive routing and queuing schemes when congestion appears.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131143402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Beyond Load Balancing: Package-Aware Scheduling for Serverless Platforms 超越负载平衡:无服务器平台的包感知调度
Gabriel Aumala, Edwin F. Boza, Luis Ortiz-Avilés, Gustavo Totoy, Cristina L. Abad
{"title":"Beyond Load Balancing: Package-Aware Scheduling for Serverless Platforms","authors":"Gabriel Aumala, Edwin F. Boza, Luis Ortiz-Avilés, Gustavo Totoy, Cristina L. Abad","doi":"10.1109/CCGRID.2019.00042","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00042","url":null,"abstract":"Fast deployment and execution of cloud functions in Function-as-a-Service (FaaS) platforms is critical, for example, for microservices architectures. However, functions that require large packages or libraries are bloated and start slowly. An optimization is to cache packages at the worker nodes instead of bundling them with the functions. However, existing FaaS schedulers are vanilla load balancers, agnostic of packages cached in response to prior function executions, and cannot properly reap the benefits of package caching. We study the case of package-aware scheduling and propose PASch, a novel scheduling algorithm that seeks package affinity during scheduling so that worker nodes can re-use execution environments with preloaded packages. PASch leverages consistent hashing and the power of 2 choices, while actively avoiding worker overload. We implement PASch in a new scheduler for the OpenLambda framework and evaluate it using simulations and real experiments. When using PASch instead of a least loaded balancer, tasks perceive an average speedup of 1.29x, and 80th percentile latency that is 23x faster. Furthermore, for the workload studied in this paper, PASch outperforms consistent hashing with bounded loads—a state-of-the-art load balancing algorithm—yielding a 1.3x average speedup, and a speedup of 1.5x at the 80th percentile.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126678866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信