Abhinandan S. Prasad, M. Arumaithurai, David Koll, Xiaoming Fu
{"title":"DMC: A Differential Marketplace for Cloud Resources","authors":"Abhinandan S. Prasad, M. Arumaithurai, David Koll, Xiaoming Fu","doi":"10.1109/CCGRID.2019.00034","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00034","url":null,"abstract":"The currently trending paradigms of edge and fog computing attempt to provide services close to the end user, to meet the demands of latency-sensitive applications and to limit bandwidth consumption in the network core. One open issue is the pricing of edge and fog resources. Current pricing schemes are usually oligopolistic and not fair. In this work, we propose DMC, a marketplace that can dynamically determine the fair price for arbitrary resource types and instances based on supply and demand existing at that period. Unlike the state-of-the-art solutions, DMC performs integral allocation of resources and thereby avoids the unbounded integrality gap. Additionally, DMC provides differential pricing among instances to allow varying prices based on the perceived value of a resource. We evaluate DMC with both heavy and non-heavy tailed distributions to reflect diverse buying interests and the number of resources sold to demonstrate the feasibility of our solution for several realistic scenarios. We observe that (i) DMC arrives at market-clearing prices; (ii) DMC generates 10x to 100x more profit than state-of-the-art solutions, while still maximizing the Nash Social Welfare to achieve prices that are fair to both buyers and the resource providers; and (iii) the computation time for DMC does not exceed 10 seconds even in the case of 500 resource types with 500 buyers each, making it applicable for real-time use cases.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127140612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raúl Gracia Tinedo, Marc Sánchez Artigas, P. López, Y. Moatti, Filip Gluszak
{"title":"Lamda-Flow: Automatic Pushdown of Dataflow Operators Close to the Data","authors":"Raúl Gracia Tinedo, Marc Sánchez Artigas, P. López, Y. Moatti, Filip Gluszak","doi":"10.1109/CCGRID.2019.00022","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00022","url":null,"abstract":"Modern data analytics infrastructures are composed of physically disaggregated compute and storage clusters. Thus, dataflow analytics engines, such as Apache Spark or Flink, are left with no choice but to transfer datasets to the compute cluster prior to their actual processing. For large data volumes, this becomes problematic, since it involves massive data transfers that exhaust network bandwidth, that waste compute cluster memory, and that may become a performance barrier. To overcome this problem, we present λFlow: a framework for automatically pushing dataflow operators (e.g., map, flatMap, filter, etc.) down onto the storage layer. The novelty of λFlow is that it manages the pushdown granularity at the operator level, which makes it a unique problem. To wit, it requires addressing several challenges, such as how to encapsulate dataflow operators and execute them on the storage cluster, and how to keep track of dependencies such that operators can be pushed down safely onto the storage layer. Our evaluation reports significant reductions in resource usage for a large variety of IO-bound jobs. For instance, λFlow was able to reduce both network bandwidth and memory requirements by 90% in Spark. Our Flink experiments also prove the extensibility of λFlow to other engines.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124854492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Big Data Analytics Exploration of Green Space and Mental Health in Melbourne","authors":"Ying Hu, R. Sinnott","doi":"10.1109/CCGRID.2019.00083","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00083","url":null,"abstract":"Numerous researchers have shown that urban green space, e.g. parks and gardens, is positively associated with health and general well-being. However, these works are typically based on surveys that have many limitations related to the sample size and the questionnaire design. Social media offers the possibility to systematically assess how human emotion is impacted by access to green space at a far larger scale that is more representative of society. In this paper, we explore how Twitter data was used to explore the relationship between green space and human emotion (sentiment). We consider the relationship between Twitter sentiment and green space in the suburbs of Melbourne and consider the impact of socio-economics and related demographic factors. We develop a linear model to explore the extent that access to green space has on the sentiment of tweeters.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123706924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maximilian Hanussek, Felix Bartusch, Jens Krüger, O. Kohlbacher
{"title":"BOOTABLE: Bioinformatics Benchmark Tool Suite","authors":"Maximilian Hanussek, Felix Bartusch, Jens Krüger, O. Kohlbacher","doi":"10.1109/CCGRID.2019.00027","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00027","url":null,"abstract":"The interest in analyzing biological data on a large scale has grown over the last years. Bioinformatic applications play an important role when it comes to the analysis of huge amounts of data. Due to the large amount of biological data and/or large problem spaces a considerable amount of computing resources is required to answer the raised research questions. In order to estimate which underlying hardware might be the most suitable for the bioinformatic tools applied, a well-defined benchmark suite is required. Such a benchmark suite can get useful in the case of purchasing hardware and even further for larger projects with the goal to establish a bioinformatics compute infrastructure. With this paper we present BOOTABLE, our bioinformatic benchmark suite. BOOTABLE currently contains six popular and widely used bioinformatic applications representing a broad spectrum of usage characteristics. It further includes an automated installation procedure and all required datasets. BOOTABLE is available from our Github repository (https://github.com/MaximilianHanussek/BOOTABLE) in various formats.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133400093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Missing Data Recovery in Large-Scale, Sparse Datacenter Traces: An Alibaba Case Study","authors":"Yi Liang, Linfeng Bi, Xing Su","doi":"10.1109/CCGRID.2019.00039","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00039","url":null,"abstract":"The trace analysis for datacenter holds a prominent importance for the datacenter performance optimization. However, due to the error and low execution priority of trace collection tasks, modern datacenter traces suffer from the serious data missing problem. Previous works handle the trace data recovery via the statistical imputation methods. However, such methods either recover the missing data with fixed values or require users to decide the relationship model among trace attributes, which are not feasible or accurate when dealing with the two missing data trends in datacenter traces: the data sparsity and the complex correlations among trace attributes. To this end, we focus on a trace released by Alibaba and propose a tensor-based trace data recovery model to facilitate the efficient and accurate data recovery for large-scale, sparse datacenter traces. The proposed model consists of two main phases. First, the data discretization and attribute selection methods work together to select the trace attributes with strong correlations with the value-missing attribute. Then, a tensor is constructed and the missing values are recovered by employing the CANDECOMP/PARAFAC decomposition-based tensor completion method. The experimental results demonstrate that our model achieves higher accuracy than six statistical or machine learning-based methods.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131288330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Soamar Homsi, Gang Quan, Wujie Wen, Gustavo A. Chaparro-Baquero, L. Njilla
{"title":"Game Theoretic-Based Approaches for Cybersecurity-Aware Virtual Machine Placement in Public Cloud Clusters","authors":"Soamar Homsi, Gang Quan, Wujie Wen, Gustavo A. Chaparro-Baquero, L. Njilla","doi":"10.1109/CCGRID.2019.00041","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00041","url":null,"abstract":"Allocating several Virtual Machines (VMs) onto a single server helps to increase cloud computing resource utilization and to reduce its operating expense. However, multiplexing VMs with different security levels on a single server gives rise to major VM-to-VM cybersecurity interdependency risks. In this paper, we address the problem of the static VM allocation with cybersecurity loss awareness by modeling it as a two-player zero-sum game between an attacker and a provider. We first obtain optimal solutions by employing the mathematical programming approach. We then seek to find the optimal solutions by quickly identifying the equilibrium allocation strategies in our formulated zero-sum game. We mean by \"equilibrium\" that none of the provider nor the attacker has any incentive to deviate from one's chosen strategy. Specifically, we study the characteristics of the game model, based on which, to develop effective and efficient allocation algorithms. Simulation results show that our proposed cybersecurity-aware consolidation algorithms can significantly outperform the commonly used multi-dimensional bin packing approaches for large-scale cloud data centers.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124294835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felix Bartusch, Maximilian Hanussek, Jens Krüger, O. Kohlbacher
{"title":"Reproducible Scientific Workflows for High Performance and Cloud Computing","authors":"Felix Bartusch, Maximilian Hanussek, Jens Krüger, O. Kohlbacher","doi":"10.1109/CCGRID.2019.00028","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00028","url":null,"abstract":"Many complex data analysis tasks are performed by scientific workflows and pipelines deployed on high performance computing (HPC) or cloud computing resources. The complex software stack required by a workflow and unnoticed dependencies can make the deployment of a pipeline a demanding task. Once deployed, workflows tend to be black boxes, especially for users that did not create the pipeline themselves. At the end of a project a researcher should archive the pipeline in order to ensure reproducibility of published results. This paper illustrates a possible solution for each of the three tasks: reproducible deployment via software containers, automated generation of provenance information to break black boxes, and using the CiTAR service for archiving software containers.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127268762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Cost of Acking in Data Stream Processing Systems","authors":"Alessio Pagliari, F. Huet, G. Urvoy-Keller","doi":"10.1109/CCGRID.2019.00047","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00047","url":null,"abstract":"The widespread use of social networks and applications such as IoT networks generates a continuous stream of data that companies and researchers want to process, ideally in real-time. Data stream processing systems (DSP) enable such continuous data analysis by implementing the set of operations to be performed on the stream as directed acyclic graph (DAG) of tasks. While these DSP systems embed mechanisms to ensure fault tolerance and message reliability, only few studies focus on the impact of these mechanisms on the performance of applications at runtime. In this paper, we demonstrate the impact of the message reliability mechanism on the performance of the application. We use an experimental approach, using the Storm middleware, to study an acknowledgment-based framework. We compare the two standard schedulers available in Storm with applications of various degrees of parallelism, over single and multi cluster scenarios. We show that the acking layer may create an unforeseen bottleneck due to the acking tasks placement; a problem which, to the best of our knowledge, has been overlooked in the scientific and technical literature. We propose two strategies for improving the acking tasks placement and demonstrate their benefit in terms of throughput and latency.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128848197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SMR-X: Flexible Parallel State Machine Replication for Cloud Computing","authors":"Meng Zhou, Weigang Wu, Zhiguang Chen, Nong Xiao","doi":"10.1109/CCGRID.2019.00043","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00043","url":null,"abstract":"State Machine Replication (SMR) is a fundamental fault tolerant technique for distributed systems. SMR traditionally requires sequential execution of commands at each replica node, so as to guarantee strong consistency among replicas. To achieve high performance at large scale cloud datacenters, SMR has been parallelized by employing multiple threads at each replica. In this paper, we propose SMR-X, a novel parallel SMR scheme, which realizes flexible mapping of commands for parallel executing at each replica. The mapping between clients' requests and work threads is dynamically adjusted according to the load level of work threads. Therefore, workloads of different threads can be well balanced and high system throughput can be achieved. The major challenge in our work lies in the inconsistency problem caused by dynamic changes in request-thread mapping. To cope with this, we design delicate mechanisms to synchronize mapping function, so that strong consistency among replicas can be guaranteed. The correctness of the proposed scheme is rigorously proved and its performance is evaluated via simulations. Simulation results show that SMR-X can achieve better load balance and lower access latency than existing parallel SMR schemes.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134322738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multivariate LSTM-Based Location-Aware Workload Prediction for Edge Data Centers","authors":"Chanh Nguyen Le Tan, C. Klein, E. Elmroth","doi":"10.1109/CCGRID.2019.00048","DOIUrl":"https://doi.org/10.1109/CCGRID.2019.00048","url":null,"abstract":"Mobile Edge Clouds (MECs) is a promising computing platform to overcome challenges for the success of bandwidth-hungry, latency-critical applications by distributing computing and storage capacity in the edge of the network as Edge Data Centers (EDCs) within the close vicinity of end-users. Due to the heterogeneous distributed resource capacity in EDCs, the application deployment flexibility coupled with the user mobility, MECs bring significant challenges to control resource allocation and provisioning. In order to develop a self-managed system for MECs which efficiently decides how much and when to activate scaling, where to place and migrate services, it is crucial to predict its workload characteristics, including variations over time and locality. To this end, we present a novel location-aware workload predictor for EDCs. Our approach leverages the correlation among workloads of EDCs in a close physical distance and applies multivariate Long Short-Term Memory network to achieve on-line workload predictions for each EDC. The experiments with two real mobility traces show that our proposed approach can achieve better prediction accuracy than a state-of-the art location-unaware method (up to 44%) and a location-aware method (up to 17%). Further, through an intensive performance measurement using various input shaking methods, we substantiate that the proposed approach achieves a reliable and consistent performance.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"110 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115737061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}