2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)最新文献_第7页

A Machine Learning Auditing Model for Detection of Multi-Tenancy Issues Within Tenant Domain 一种用于租户域中多租户问题检测的机器学习审计模型

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00081

Cleverton Vicentini, A. Santin, E. Viegas, Vilmar Abreu

{"title":"A Machine Learning Auditing Model for Detection of Multi-Tenancy Issues Within Tenant Domain","authors":"Cleverton Vicentini, A. Santin, E. Viegas, Vilmar Abreu","doi":"10.1109/CCGRID.2018.00081","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00081","url":null,"abstract":"Cloud computing is intrinsically based on multi-tenancy, which enables a physical host to be shared amongst several tenants (customers). In this context, for several reasons, a cloud provider may overload the physical machine by hosting more tenants that it can adequately handle. In such a case, a tenant may experience application performance issues. However, the tenant is not able to identify the causes, since most cloud providers do not provide performance metrics for customer monitoring, or when they do, the metrics can be biased. This study proposes a two-tier auditing model for the identification of multi-tenancy issues within the tenant domain. Our proposal relies on machine learning techniques fed with application and virtual resource metrics, gathered within the tenant domain, for identifying overloading resources in a distributed application context. The evaluation using Apache Storm as a case study, has shown that our proposal is able to identify a node experiencing multi-tenancy interference of at least 6%, with less than 1% false-positive or false-negative rates, regardless of the affected resource. Nonetheless, our model was able to generalize the multi-tenancy interference behavior based on private cloud testbed monitoring, for different hardware configurations. Thus, a system administrator can monitor an application in a public cloud provider, without possessing any hardware-level performance metrics.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"286 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124565354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

An Empirical Evaluation of Allgatherv on Multi-GPU Systems Allgatherv在多gpu系统上的实证评价

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00027

Thomas B. Rolinger, T. Simon, Christopher D. Krieger

{"title":"An Empirical Evaluation of Allgatherv on Multi-GPU Systems","authors":"Thomas B. Rolinger, T. Simon, Christopher D. Krieger","doi":"10.1109/CCGRID.2018.00027","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00027","url":null,"abstract":"Applications for deep learning and big data analytics have compute and memory requirements that exceed the limits of a single GPU. However, effectively scaling out an application to multiple GPUs is challenging due to the complexities of communication between the GPUs, particularly for collective communication with irregular message sizes. In this work, we provide a performance evaluation of the Allgatherv routine on multi-GPU systems, focusing on GPU network topology and the communication library used. We present results from the OSU-micro benchmark as well as conduct a case study for sparse tensor factorization, one application that uses Allgatherv with highly irregular message sizes. We extend our existing tensor factorization tool to run on systems with different node counts and varying number of GPUs per node. We then evaluate the communication performance of our tool when using traditional MPI, CUDA-aware MVAPICH and NCCL across a suite of real-world data sets on three different systems: a 16-node cluster with one GPU per node, NVIDIA's DGX-1 with 8 GPUs and Cray's CS-Storm with 16 GPUs. Our results show that irregularity in the tensor data sets produce trends that contradict those in the OSU micro-benchmark, as well as trends that are absent from the benchmark.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134572638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Towards Massive Consolidation in Data Centers with SEaMLESS 实现无缝数据中心的大规模整合

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00038

A. Segalini, Dino Lopez Pacheco, Quentin Jacquemart

引用次数: 1

SuperCell: Adaptive Software-Defined Storage for Cloud Storage Workloads SuperCell:针对云存储工作负载的自适应软件定义存储

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00025

K. Uehara, Yu Xiang, Y. Chen, M. Hiltunen, Kaustubh R. Joshi, R. Schlichting

{"title":"SuperCell: Adaptive Software-Defined Storage for Cloud Storage Workloads","authors":"K. Uehara, Yu Xiang, Y. Chen, M. Hiltunen, Kaustubh R. Joshi, R. Schlichting","doi":"10.1109/CCGRID.2018.00025","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00025","url":null,"abstract":"The explosive growth of data due to the increasing adoption of cloud technologies in the enterprise has created a strong demand for more flexible, cost-effective, and scalable storage solutions. Many storage systems, however, are not well matched to the workloads they service due to the difficulty of configuring the storage system optimally a priori with only approximate knowledge of the workload characteristics. This paper shows how cloud-based orchestration can be leveraged to create flexible storage solutions that use continuous adaptation to tailor themselves to their target application workloads, and in doing so, provide superior performance, cost, and scalability over traditional fixed designs. To demonstrate this approach, we have built \"SuperCell,\" a Ceph-based distributed storage solution with a recommendation engine for the storage configuration. SuperCell provides storage operators with real-time recommendations on how to reconfigure the storage system to optimize its performance, cost, and efficiency based on statistical storage modeling and data analysis of the actual workload. Using real cloud storage workloads, we experimentally demonstrate that SuperCell reduces the cost of storage systems by up to 48%, while meeting service level agreement (SLA) 99% of the time, a level that any static design fails to meet for the workloads.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131582213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Improving Energy Efficiency of Database Clusters Through Prefetching and Caching 通过预取和缓存提高数据库集群的能源效率

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00065

Yi Zhou, Shubbhi Taneja, Mohammed I. Alghamdi, X. Qin

引用次数: 2

SHAD: The Scalable High-Performance Algorithms and Data-Structures Library SHAD:可扩展的高性能算法和数据结构库

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00071

Vito Giovanni Castellana, Marco Minutoli

{"title":"SHAD: The Scalable High-Performance Algorithms and Data-Structures Library","authors":"Vito Giovanni Castellana, Marco Minutoli","doi":"10.1109/CCGRID.2018.00071","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00071","url":null,"abstract":"The unprecedented amount of data that needs to be processed in emerging data analytics applications poses novel challenges to industry and academia. Scalability and high performance become more than a desirable feature because, due to the scale and the nature of the problems, they draw the line between what is achievable and what is unfeasible. In this paper, we propose SHAD, the Scalable High-performance Algorithms and Data-structures library. SHAD adopts a modular design that confines low level details and promotes reuse. SHAD's core is built on an Abstract Runtime Interface which enhances portability and identifies the minimal set of features of the underlying system required by the framework. The core library includes common data-structures such as: Array, Vector, Map and Set. These are designed to accommodate significant amount of data which can be accessed in massively parallel environments, and used as building blocks for SHAD extensions, i.e. higher level software libraries. We have validated and evaluated our design with a performance and scalability study of the core components of the library. We have validated the design flexibility by proposing a Graph Library as an example of SHAD extension, which implements two different graph data-structures; we evaluate their performance with a set of graph applications. Experimental results show that the approach is promising in terms of both performance and scalability. On a distributed system with 320 cores, SHAD Arrays are able to sustain a throughput of 65 billion operations per second, while SHAD Maps sustain 1 billion of operations per second. Algorithms implemented using the Graph Library exhibit performance and scalability comparable to a custom solution, but with smaller development effort.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124036661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Evaluation of Highly Available Cloud Streaming Systems for Performance and Price 高可用性云流系统的性能和价格评估

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00056

Dung Nguyen, André Luckow, Edward B. Duffy, Ken E. Kennedy, A. Apon

引用次数: 10

Adaptive Communication for Distributed Deep Learning on Commodity GPU Cluster 基于商用GPU集群的分布式深度学习自适应通信

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00043

Li-Yung Ho, Jan-Jan Wu, Pangfeng Liu

引用次数: 4

Achieving Performance Balance Among Spark Frameworks with Two-Level Schedulers 用两级调度器实现Spark框架之间的性能平衡

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00028

Aleksandra Kuzmanovska, H. V. D. Bogert, R. H. Mak, D. Epema

{"title":"Achieving Performance Balance Among Spark Frameworks with Two-Level Schedulers","authors":"Aleksandra Kuzmanovska, H. V. D. Bogert, R. H. Mak, D. Epema","doi":"10.1109/CCGRID.2018.00028","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00028","url":null,"abstract":"When multiple data-processing frameworks with time-varying workloads are simultaneously present in a single cluster or data-center, an apparent goal is to have them experience equal performance, expressed in whatever performance metrics are applicable. In modern data-center environments, Two-Level Schedulers (TLSs) that leave the scheduling of individual jobs to the schedulers within the data-processing frameworks are typically used for managing the resources of data-processing frameworks. Two such TLSs with opposite designs are Mesos and Koala-F. Mesos employs fine-grained resource allocation and aims at Dominant Resource Fairness (DRF) among framework instances by offering resources to them for the duration of a single task. In contrast, Koala-F aims at performance fairness among framework instances by employing dynamic coarse-grained resource allocation of sets of complete nodes based on performance feedback from individual instances. The goal of this paper is to explore the trade-offs between these two TLS designs when trying to achieve performance balance among frameworks. We select Apache Spark as a representative of data-processing frameworks, and perform experiments on a modest-sized cluster, using jobs chosen from commonly used data-processing benchmarks. Our results reveal that achieving performance balance among framework instances is a challenge for both TLS designs, despite their opposite design choices. Moreover, we exhibit design flaws in the DRF allocation policy that prevent Mesos from achieving performance balance. Finally, to remedy these flaws, we propose a feedback controller for Mesos that dynamically adapts framework weights, as used in Weighted DRF (W-DRF), based on their performance.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116725946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Scalable Unified Model for Dynamic Data Structures in Message Passing (Clusters) and Shared Memory (multicore CPUs) Computing environments 消息传递(集群)和共享内存(多核cpu)计算环境下动态数据结构的可扩展统一模型

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00007

G. Laccetti, M. Lapegna, R. Montella

{"title":"A Scalable Unified Model for Dynamic Data Structures in Message Passing (Clusters) and Shared Memory (multicore CPUs) Computing environments","authors":"G. Laccetti, M. Lapegna, R. Montella","doi":"10.1109/CCGRID.2018.00007","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00007","url":null,"abstract":"Concurrent data structures are widely used in many software stack levels, ranging from high level parallel scientific applications to low level operating systems. The key issue of these objects is their concurrent use by several computing units (threads or process) so that the design of these structures is much more difficult compared to their sequential counterpart, because of their extremely dynamic nature requiring protocols to ensure data consistency, with a significant cost overhead. At this regard, several studies emphasize a tension between the needs of sequential correctness of the concurrent data structures and scalability of the algorithms, and in many cases it is evident the need to rethink the data structure design, using approaches based on randomization and/or redistribution techniques in order to fully exploit the computational power of the recent computing environments. The problem is grown in importance with the new generation High Performance Computing systems aimed to achieve extreme performance. It is easy to observe that such systems are based on heterogeneous architectures integrating several independent nodes in the form of clusters or MPP systems, where each node is composed by powerful computing elements (CPU core, GPUs or other acceleration devices) sharing resources in a single node. These systems therefore make massive use of communication libraries to exchange data among the nodes, as well as other tools for the management of the shared resources inside a single node. For such a reason, the development of algorithms and scientific software for dynamic data structures on these heterogeneous systems implies a suitable combination of several methodologies and tools to deal with the different kinds of parallelism corresponding to each specific device, so that to be aware of the underlying platform. The present work is aimed to introduce a scalable model to manage a special class of dynamic data structure known as heap based priority queue (or simply heap) on these heterogeneous architectures. A heap is generally used when the applications needs set of data not requiring a complete ordering, but only the access to some items tagged with high priority. In order to ensure a tradeoff between the correct access to high priority items by the several computing units with a low communication and synchronization overhead, a suitable reorganization of the heap is needed. More precisely we introduce a unified scalable model that can be used, with no modifications, to redeploy the items of a heap both in message passing environments (such as clusters and or MMP multicomputers with several nodes) as well as in shared memory environments (such as CPUs and multiprocessors with several cores) with an overhead independent of the number of computing units. Computational results related to the application of the proposed strategy on some numerical case studies are presented for different types of computing environments.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"378 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126972784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1