{"title":"Data Analytics Using Two-Stage Intelligent Model Pipelining for Virtual Network Functions","authors":"T. Miyazawa, Ved P. Kafle, H. Asaeda","doi":"10.1109/CloudNet53349.2021.9657133","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657133","url":null,"abstract":"The use of machine learning (ML) technologies to predict server workloads and proactively adjust the amount of computational resource to maximize the quality of services is an enormous challenge. In this study, we introduce an ITU-T Y.3177 compliant framework for autonomous resource control and management of virtualized network infrastructures. Based on this framework, we propose (1) an architecture for a data analytics system consisting of learning and prediction components, and (2) a two-stage intelligent model pipelining mechanism for the learning component that cascades two ML models, namely nonlinear regression and multiple regression, to understand the trends of the fluctuations in CPU usage of a network node and predict the peak CPU usage of the node in the time granularity of seconds. We evaluated the proposed mechanism in an experimental network that installed in-network caching nodes as network functions. We prove that our ML models are capable of performing agile data analytics in the time granularity of seconds and can reduce the prediction errors of peak CPU usage.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124015540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Machine Learning Approach for Service Function Chain Embedding in Cloud Datacenter Networks","authors":"T. Wassing, D. D. Vleeschauwer, C. Papagianni","doi":"10.1109/CloudNet53349.2021.9657124","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657124","url":null,"abstract":"Network Functions Virtualization (NFV) is an industry effort to replace traditional hardware middleboxes with virtualized network functions (VNFs) running on general-build hardware platforms, enabling cost reduction, operational efficiency, and service agility. A Service Function Chain (SFC) constitutes an end-to-end network service, formed by chaining together VNFs in specific order. Infrastructure providers and cloud service providers try to optimally allocate computing and network resources to SFCs, in order to reduce costs and increase profit margins. The corresponding resource allocation problem, known as SFC embedding problem, is proven to be NP-hard.Traditionally the problem has been formulated as Mixed Integer Linear Program (MILP), assuming each SFC’s requirements are known a priori, while the embedding decision is based on a snapshot of the infrastructure’s load at request time. Reinforcement learning (RL) has been recently applied, showing promising results, specifically in dynamic environments, where such assumptions are considered unrealistic. However, standard RL techniques such as Q-learning might not be appropriate for addressing the problem at scale, as they are often ineffective for high-dimensional domains. On the other hand, Deep RL (DRL) algorithms can deal with high dimensional state spaces. In this paper, a Deep Q-Learning (DQL) approach is proposed to address the SFC resource allocation problem. The DQL agent utilizes a neural network for function approximation in Q-learning with experience replay learning. The simulations demonstrate that the new approach outperforms the linear programming approach. In addition, the DQL agent can perform SFC request admission control in real time.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117029916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterizing network performance of single-node large-scale container deployments","authors":"Conrado Boeira, M. Neves, T. Ferreto, I. Haque","doi":"10.1109/CloudNet53349.2021.9657138","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657138","url":null,"abstract":"Cloud services have shifted from complex monolithic designs to hundreds of loosely coupled microservices over the last years. These microservices communicate via pre-defined APIs (e.g., RPC) and are usually implemented on top of containers. To make the microservices model profitable, cloud providers often co-locate them on a single (virtual) machine, thus achieving high server utilization. Despite being overlooked by previous work, the challenge of providing high-quality network connectivity to multiple containers running on the same host becomes crucial for the overall cloud service performance in this scenario. For that reason, this paper focuses on identifying the overheads and bottlenecks caused by the increasing number of concurrent containers running on a single node, particularly from a networking perspective. Through an extensive set of experiments, we show that the networking performance is mostly restricted by the CPU capacity (even for I/O intensive workloads), that containers can largely suffer from interference originated from packet processing, and that proper core scheduling policies can significantly improve connection throughput. Ultimately, our findings can help to pave the way towards more efficient large-scale microservice deployments.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128029596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Longer Stay Less Priority: Flow Length Approximation Used In Information-Agnostic Traffic Scheduling In Data Center Networks","authors":"M. S. Iqbal, Chien Chen","doi":"10.1109/CloudNet53349.2021.9657148","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657148","url":null,"abstract":"Numerous scheduling approaches have been proposed to improve user experiences in a data center network (DCN) by reducing flow completion time (FCT). Mimicking the shortest job first (SJF) has been proved to be the prominent way to improve FCT. To do so, some approaches require flow size or completion time information in advance, which is not possible in scenarios like HTTP chunk transfer or database query response. Some information-agnostic schemes require involving end-hosts for counting the number of bytes sent. We present Longer Stay Less Priority (LSLP), an information-agnostic flow scheduling scheme, like Multi-Level Feedback Queue (MLFQ) scheduler in operating systems, that aims to mimic SJF using P4 switches in a DCN. LSLP considers all the flows as short flows initially and assigns them to the highest priority queue, and flows get demoted to the lower priority queues over time. LSLP estimates the active time of a flow by leveraging the state-of-the-art P4 switch’s programmable nature. LSLP estimates the active time of a group of new flows that arrive during a time interval and assigns their packets to the highest priority. At the beginning of the next time interval, arriving packets of old flows are placed one priority lower except for those already in the lowest priority queue. Therefore, short flows can be completed in the few higher priority queues while long flows are demoted to lower priority queues. We have evaluated LSLP via a series of tests and shown that its performance is comparable to the existing scheduling schemes.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134232918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GDSim: Benchmarking Geo-Distributed Data Center Schedulers","authors":"Daniel S. F. Alves, K. Obraczka, A. Kabbani","doi":"10.1109/CloudNet53349.2021.9657143","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657143","url":null,"abstract":"As cloud providers scale up their data centers and distribute them around the world to meet demand, proposing new job schedulers that take into account data center geographical distribution have been receiving considerable attention from the data center management research and practitioner community. However, testing and benchmarking new schedulers for geo-distributed data centers is complicated by the lack of a common, easily extensible experimental platform. To address this gap, we propose GDSim, an open-source job scheduling simulation environment for geo-distributed data centers that aims at facilitating development, testing, and evaluation of new geo-distributed schedulers. We showcase GDSim by using it to reproduce experiments and results for recently proposed geodistributed job schedulers, as well as testing those schedulers under new conditions which can reveal trends that have not been previously uncovered.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129631432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clément Cassé, Pascal Berthou, P. Owezarski, S. Josset
{"title":"Using Distributed Tracing to Identify Inefficient Resources Composition in Cloud Applications","authors":"Clément Cassé, Pascal Berthou, P. Owezarski, S. Josset","doi":"10.1109/CloudNet53349.2021.9657140","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657140","url":null,"abstract":"Cloud-Applications are the new industry standard way of designing Web-Applications. With Cloud Computing, Applications are usually designed as microservices, and developers can take advantage of thousands of such existing microservices, involving several hundred of cross-component communications on different physical resources.Microservices orchestration (as Kubernetes) is an automatic process, which manages each component lifecycle, and notably their allocation on the different resources of the cloud infrastructure. Whereas such automatic cloud technologies ease development and deployment, they nevertheless obscure debugging and performance analysis. In order to gain insight on the composition of services, distributed tracing recently emerged as a way to get the decomposition of the activity of each component within a cloud infrastructure. This paper aims at providing methodologies and tools (leveraging state-of-the-art tracing) for getting a wider view of application behaviours, especially focusing on application performance assessment.In this paper, we focus on using distributed traces and allocation information from microservices to model their dependencies as a hierarchical property graph. By applying graph rewriting operations, we managed to project and filter communications observed between microservices at higher abstraction layers like the machine nodes, the zones or regions. Finally, in this paper we propose an implementation of the model running on a microservices shopping application deployed on a Zonal Kubernetes cluster monitored by OpenTelemetry traces. We propose using the flow hierarchy metric on the graph model to pinpoint cycles that reveal inefficient resource composition inducing possible performance issues and economic waste.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122251525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding and Leveraging Cluster Heterogeneity for Efficient Execution of Cloud Services","authors":"S. Shukla, D. Ghosal, M. Farrens","doi":"10.1109/CloudNet53349.2021.9657128","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657128","url":null,"abstract":"Cloud warehouses are becoming increasingly heterogeneous by introducing different types of processors of varying speed and energy-efficiency. Developing an optimal strategy for distributing latency-critical service (LC-service) requests across multiple instances in a heterogeneous cluster is non-trivial. In this paper, we present a detailed analysis of the impact of cluster heterogeneity on the achieved server utilization and energy footprint to meet the required service-level latency bound (SLO) of LC-services. We develop cluster-level control plane strategies to address two forms of cluster heterogeneity - capacity and energy-efficiency. First, we propose Maximum-SLO-Guaranteed Capacity (MSG-Capacity) proportional load balancing for LC-Services to address the capacity heterogeneity and show that it can achieve higher utilization than naive performance-based heterogeneity awareness. Then, we present Efficient-First (E-First) heuristic-based Instance Scaling to address the efficiency heterogeneity. Finally, to address the bi-dimensional (capacity and energy-efficiency) heterogeneity, we superimpose the two approaches to propose Energy-efficient and MSG-Capacity (E2MC) based control-plane strategy that maximizes utilization while minimizing the energy footprint.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"36 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130319480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonios Makris, Abderrahmane Boudi, M. Coppola, Luís Cordeiro, M. Corsini, Patrizio Dazzi, Ferran Diego Andilla, Yago González Rozas, Manos N. Kamarianakis, M. Pateraki, Thu Le Pham, Antonis I Protopsaltis, Aravindh Raman, Alessandro Romussi, Luis Rosa, Elena Spatafora, T. Taleb, T. Theodoropoulos, K. Tserpes, E. Zschau, U. Herzog
{"title":"Cloud for Holography and Augmented Reality","authors":"Antonios Makris, Abderrahmane Boudi, M. Coppola, Luís Cordeiro, M. Corsini, Patrizio Dazzi, Ferran Diego Andilla, Yago González Rozas, Manos N. Kamarianakis, M. Pateraki, Thu Le Pham, Antonis I Protopsaltis, Aravindh Raman, Alessandro Romussi, Luis Rosa, Elena Spatafora, T. Taleb, T. Theodoropoulos, K. Tserpes, E. Zschau, U. Herzog","doi":"10.1109/CloudNet53349.2021.9657125","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657125","url":null,"abstract":"The paper introduces the CHARITY framework, a novel framework which aspires to leverage the benefits of intelligent, network continuum autonomous orchestration of cloud, edge, and network resources, to create a symbiotic relationship between low and high latency infrastructures. These infrastructures will facilitate the needs of emerging applications such as holographic events, virtual reality training, and mixed reality entertainment. The framework relies on different enablers and technologies related to cloud and edge for offering a suitable environment in order to deliver the promise of ubiquitous computing to the NextGen application clients. The paper discusses the main pillars that support the CHARITY vision, and provide a description of the planned use cases that are planned to demonstrate CHARITY capabilities.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121962308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Zink, D. Irwin, E. Cecchet, Hakan Saplakoglu, O. Krieger, Martin C. Herbordt, M. Daitzman, Peter Desnoyers, M. Leeser, Suranga Handagala
{"title":"The Open Cloud Testbed (OCT): A Platform for Research into new Cloud Technologies","authors":"M. Zink, D. Irwin, E. Cecchet, Hakan Saplakoglu, O. Krieger, Martin C. Herbordt, M. Daitzman, Peter Desnoyers, M. Leeser, Suranga Handagala","doi":"10.1109/CloudNet53349.2021.9657109","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657109","url":null,"abstract":"The NSF-funded Open Cloud Testbed (OCT) project is building and supporting a testbed for research and experimentation into new cloud platforms – the underlying software which provides cloud services to applications. Testbeds such as OCT are critical for enabling research into new cloud technologies – research that requires experiments which potentially change the operation of the cloud itself.This paper gives an overview of the Open Cloud Testbed, including an overview on the existing components OCT is based on and the description of new infrastructure and software extension. In addition, we present several use cases of OCT, including a description of FPGA-based research enabled by newly-deployed resources.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"97 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128001209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}