W. Hummer, Vinod Muthusamy, T. Rausch, Parijat Dube, Kaoutar El Maghraoui, Anupama Murthi, Punleuk Oum
{"title":"ModelOps: Cloud-Based Lifecycle Management for Reliable and Trusted AI","authors":"W. Hummer, Vinod Muthusamy, T. Rausch, Parijat Dube, Kaoutar El Maghraoui, Anupama Murthi, Punleuk Oum","doi":"10.1109/IC2E.2019.00025","DOIUrl":"https://doi.org/10.1109/IC2E.2019.00025","url":null,"abstract":"This paper proposes a cloud-based framework and platform for end-to-end development and lifecycle management of artificial intelligence (AI) applications. We build on our previous work on platform-level support for cloud-managed deep learning services, and show how the principles of software lifecycle management can be leveraged and extended to enable automation, trust, reliability, traceability, quality control, and reproducibility of AI pipelines. Based on a discussion of use cases and current challenges, we describe a framework for managingAI application lifecycles and its key components. We also show concrete examples that illustrate how this framework enables managing and executing model training and continuous learning pipelines while infusing trusted AI principles.","PeriodicalId":226094,"journal":{"name":"2019 IEEE International Conference on Cloud Engineering (IC2E)","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129784951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maintenance Scheduling for Cloud Infrastructure with Timing Constraints of Live Migration","authors":"Shingo Okuno, Fumi Iikura, Yukihiro Watanabe","doi":"10.1109/IC2E.2019.00032","DOIUrl":"https://doi.org/10.1109/IC2E.2019.00032","url":null,"abstract":"In this paper, we propose an implementation of a maintenance scheduler for cloud infrastructures. Live migration associated with maintenance work is important to ensure service continuity for all virtual machines in an infrastructure. However, executing the migration process when the machines are under heavy load negatively affects cloud users' businesses, such as by degrading performance and extending downtime. We can avoid this by finding an appropriate time period for live migration and performing the migration then. This idea is convenient for cloud users but inconvenient for cloud providers, that is, maintenance work should be completed as soon as possible for security reasons. To satisfy both the users' convenience and providers' requirements, we designed a maintenance scheduling problem to find the appropriate time period and to shorten the maintenance work period. Since it is a large-scale combinatorial optimization problem with complex constraints on maintenance requirements, we described the constraints by using answer set programming and implemented a maintenance scheduler on the basis of a divide-and-conquer approach to reduce the computational complexity exponentially. We evaluated our scheduler by using information on a real configuration of a commercial cloud infrastructure. While a naive approach to solving the maintenance scheduling problem could not find any feasible solutions within a realistic amount of time and memory, our implementation generated the best maintenance schedule for 1032 physical machines and 14208 virtual machines in 206 s with a memory usage of 1086 MB.","PeriodicalId":226094,"journal":{"name":"2019 IEEE International Conference on Cloud Engineering (IC2E)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122359395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Moysis Symeonides, Demetris Trihinas, Z. Georgiou, G. Pallis, M. Dikaiakos
{"title":"Query-Driven Descriptive Analytics for IoT and Edge Computing","authors":"Moysis Symeonides, Demetris Trihinas, Z. Georgiou, G. Pallis, M. Dikaiakos","doi":"10.1109/IC2E.2019.00-12","DOIUrl":"https://doi.org/10.1109/IC2E.2019.00-12","url":null,"abstract":"With consumers embracing the prevalence of ubiquitously connected smart devices, Edge Computing is emerging as a principal computing paradigm for latency-sensitive and in-proximity services. However, as the plethora of data generated across connected devices continues to vastly increase, the need to query the \"edge\" and derive in-time analytic insights is more evident than ever. This paper introduces our vision for a rich and declarative query model abstraction particularly tailored for the unique characteristics of Edge Computing and presents a prototype framework that realizes our vision. Towards this, the declarative query model enables users to express high-level and descriptive analytic insights, while our framework compiles, optimizes and executes the query plan decoupled from the programming model of the underlying data processing engine. Afterwards, we showcase a number of potential use-cases which stand to benefit from the realization of query-driven descriptive analytics for edge computing. We conclude by elaborating on the open challenges that still must be addressed to realize our vision and potential research opportunities for the academic community to further advance the current State-of-the-Art.","PeriodicalId":226094,"journal":{"name":"2019 IEEE International Conference on Cloud Engineering (IC2E)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134222542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing AWS Spot Instance Pricing","authors":"Gareth George, R. Wolski, C. Krintz, J. Brevik","doi":"10.1109/IC2E.2019.00036","DOIUrl":"https://doi.org/10.1109/IC2E.2019.00036","url":null,"abstract":"Many cloud computing vendors offer a preemptible class of service for rented virtual machines. In November 2017, Amazon.com changed the pricing mechanism for its preemptible \"spot instances\" so that prices would change more \"smoothly.\" This paper analyzes the effect of this change on spot instance prices. It examines the prices immediately before and after the mechanism change to determine the extent to which prices themselves changed. It then compares the 90-day period immediately after the change in mechanism to the next 90-day period. Finally, it compares the two most recent 90-day periods (ending on October 15, 2018). Our results indicate that in addition to smoothing prices, the mechanism change introduced generally higher prices which is a trend that continues.","PeriodicalId":226094,"journal":{"name":"2019 IEEE International Conference on Cloud Engineering (IC2E)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124370091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Self-Managing Cloud Storage with Reinforcement Learning","authors":"Ridwan Rashid Noel, Rohit Mehra, P. Lama","doi":"10.1109/IC2E.2019.000-9","DOIUrl":"https://doi.org/10.1109/IC2E.2019.000-9","url":null,"abstract":"Cloud storage services are often associated with various performance issues due to load imbalance, interference from background tasks such as data scrubbing, backfilling, recovery, and the difference in processing capabilities of heterogeneous servers in a datacenter. This has a significant impact on a broad range of applications that are characterized by massive working sets and real-time constraints. However, it is challenging and burdensome for human operators to hand-tune various control-knobs in a cloud-scale storage cluster for maintaining optimal performance under diverse workload conditions. Our study on an open-source object-based storage system, Ceph, shows that common load balancing strategies are ineffective unless they are adapted according to workload characteristics. Furthermore, positive effects of an applied strategy may not be immediately visible. To address these challenges, we developed a machine learning based system adaptation technique that enables a cloud storage system to manage itself through load balancing and data migration with the aim of delivering optimal performance in the face of diverse workload patterns and resource bottlenecks. In particular, we applied a stochastic policy gradient based reinforcement learning technique to track performance hotspots in the storage cluster, and take appropriate corrective actions to maximize future performance under a variety of complex scenarios. For this purpose, we leveraged system-level performance monitoring and commonly available control-knobs in object-based cloud storage systems. We implemented the developed techniques to build an Adaptive Resource Management (ARM) system for object based storage cluster, and evaluated its performance on NSF Cloud's Chameleon testbed. Experiments using Cloud Object Storage Benchmark (COSBench) show that, ARM improves the average read and write response time of Ceph storage cluster by upto 50% and 33% respectively, compared to the default case. It also outperforms a state-of-the-art dynamic load rebalancing technique in terms of read and write performance of Ceph storage by 43% and 36% respectively.","PeriodicalId":226094,"journal":{"name":"2019 IEEE International Conference on Cloud Engineering (IC2E)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122294470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ketan Bhardwaj, Ada Gavrilovska, Vladimir Kolesnikov, Matt Saunders, Hobin Yoon, Mugdha Bondre, M. Babu, Jacob Walsh
{"title":"Addressing the Fragmentation Problem in Distributed and Decentralized Edge Computing: A Vision","authors":"Ketan Bhardwaj, Ada Gavrilovska, Vladimir Kolesnikov, Matt Saunders, Hobin Yoon, Mugdha Bondre, M. Babu, Jacob Walsh","doi":"10.1109/IC2E.2019.00030","DOIUrl":"https://doi.org/10.1109/IC2E.2019.00030","url":null,"abstract":"At the core of the value proposition of edge computing is the ability to put computation close enough to the data sources, on demand. However, the data sources, computational infrastructure and software services needed to come together to power emerging and future edge computing applications are fragmented across different stakeholders, each with their own incentives, policies, and constraints on resources they can afford. This fragmentation limits the ability of edge computing to guarantee to applications and data the edge which will deliver the desired benefit. In this paper, we present our vision for an Edge Exchange, a decentralized directory service for a multi-stakeholder edge, as a path forward to enabling applications to be deployed across the best available edge resources, while still providing each stakeholder with controls regarding their resource use and sharing policies.","PeriodicalId":226094,"journal":{"name":"2019 IEEE International Conference on Cloud Engineering (IC2E)","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124362458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tatsuhiro Chiba, Rina Nakazawa, H. Horii, Sahil Suneja, Seetharami R. Seelam
{"title":"ConfAdvisor: A Performance-centric Configuration Tuning Framework for Containers on Kubernetes","authors":"Tatsuhiro Chiba, Rina Nakazawa, H. Horii, Sahil Suneja, Seetharami R. Seelam","doi":"10.1109/IC2E.2019.00031","DOIUrl":"https://doi.org/10.1109/IC2E.2019.00031","url":null,"abstract":"Configuration tuning of software is often a good option to improve application performance without any application code modifications. Although we can casually change configurations, it is not easy to apply optimal configurations, as optimal configurations require deep knowledge of the underlying system. This is problematic because applications with suboptimal configuration result in poor performance. As container and container management systems have emerged as an application platform on the cloud, configuration tuning becomes even more challenging because containers add more complexity to the application performance. We need to consider not only fundamental misconfiguration but also container image verification, deployment configuration, application characteristics awareness based on metrics and logs. Although previous knowledge regarding how we should tune configurations for a system software is sometimes available, knowledge about performance tuning practices is neither normalized nor reusable to expand on any advice for misconfiguration to the containers. Even in the cloud-native environment, there is no centralized service to deliver knowledge continuously to application containers nor a framework to develop a misconfiguration fix rule for a container throughout its lifetime. In this paper, we propose a performance-centric configuration tuning framework for containers on Kubernetes, named ConfAdvisor, that enables containers to achieve a higher performance by validating various misconfigurations adaptively. ConfAdivsor gives config tuning advice to application containers, images, and Kubernetes specs and also provides a development framework to build configuration validation rules. We present the design of ConfAdvisor and provide several case studies to tune application containers in the real world.","PeriodicalId":226094,"journal":{"name":"2019 IEEE International Conference on Cloud Engineering (IC2E)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127267936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Edge-Cloud Orchestration: Strategies for Service Placement and Enactment","authors":"I. Petri, O. Rana, A. Zamani, Y. Rezgui","doi":"10.1109/IC2E.2019.00020","DOIUrl":"https://doi.org/10.1109/IC2E.2019.00020","url":null,"abstract":"As devices existing at the edge of the network improve in their processing and data storage capacity, there is increasing potential to host and enact services on such devices. A workflow that was traditionally enacted on a data centre can be fragmented across both edge and data centre hosted resources. The following aspects are investigated in this work: (i) mechanisms for dividing a workflow across edge and cloud/data centre resources; (ii) service hosting environments that can be shared across edge and data centre resources; (iii) performance metrics that can influence service placement and selection. An \"edge orchestrator\" is a resource manager that makes such decisions on the behalf of a user application, and which may be centralised or distributed. An industry scenarios is used to illustrate decision points that influence such choices within an edge orchestrator. The overall objective considered is the completion of the workflow within some deadline constraint by the edge orchestrator.","PeriodicalId":226094,"journal":{"name":"2019 IEEE International Conference on Cloud Engineering (IC2E)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116110789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the SQUEET Program Chairs","authors":"","doi":"10.1109/ic2e.2019.00-17","DOIUrl":"https://doi.org/10.1109/ic2e.2019.00-17","url":null,"abstract":"","PeriodicalId":226094,"journal":{"name":"2019 IEEE International Conference on Cloud Engineering (IC2E)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127070012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting the End-to-End Tail Latency of Containerized Microservices in the Cloud","authors":"Joy Rahman, P. Lama","doi":"10.1109/IC2E.2019.00034","DOIUrl":"https://doi.org/10.1109/IC2E.2019.00034","url":null,"abstract":"Large-scale web services are increasingly adopting cloud-native principles of application design to better utilize the advantages of cloud computing. This involves building an application using many loosely coupled service-specific components (microservices) that communicate via lightweight APIs, and utilizing containerization technologies to deploy, update, and scale these microservices quickly and independently. However, managing the end-to-end tail latency of requests flowing through the microservices is challenging in the absence of accurate performance models that can capture the complex interplay of microservice workflows with cloudinduced performance variability and inter-service performance dependencies. In this paper, we present performance characterization and modeling of containerized microservices in the cloud. Our modeling approach aims at enabling cloud platforms to combine resource usage metrics collected from multiple layers of the cloud environment, and apply machine learning techniques to predict the end-to-end tail latency of microservice workflows. We implemented and evaluated our modeling approach on NSF Cloud's Chameleon testbed using KVM for virtualization, Docker Engine for containerization and Kubernetes for container orchestration. Experimental results with an open-source microservices benchmark, Sock Shop, show that our modeling approach achieves high prediction accuracy even in the presence of multi-tenant performance interference.","PeriodicalId":226094,"journal":{"name":"2019 IEEE International Conference on Cloud Engineering (IC2E)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129893937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}