{"title":"ViperProbe: Rethinking Microservice Observability with eBPF","authors":"Joshua Levin, Theophilus A. Benson","doi":"10.1109/CloudNet51028.2020.9335808","DOIUrl":"https://doi.org/10.1109/CloudNet51028.2020.9335808","url":null,"abstract":"Recent shifts to microservice-based architectures and the supporting servicemesh radically disrupt the landscape of performance-oriented management tasks. While the adoption of frameworks like Istio and Kubernetes ease the management and organization of such systems, they do not themselves provide strong observability. Microservice observability requires diverse, highly specialized, and often adaptive, metrics and algorithms to monitor both the health of individual services and the larger application. However, modern metrics collection frameworks are relatively static and rigid. We introduce ViperProbe, an eBPF-based microservices collection framework that provides (1) dynamic sampling and (2) collection of deep, diverse, and precise system metrics. Viper-Probe builds on the observation that the adoption of a common set of design patterns, e.g., servicemesh, enables offline analysis. By examining the performance profile of these patterns before deploying on production, ViperProbe can effectively reduce the set of collected metrics, thereby improving the efficiency and effectiveness of those metrics. To the best of our knowledge, ViperProbe is the first scalable eBPF-based dynamic and adaptive microservices metrics collection framework. Our results show ViperProbe has limited overhead, while significantly more effective for traditional management tasks, e.g., horizontal autoscaling.","PeriodicalId":156419,"journal":{"name":"2020 IEEE 9th International Conference on Cloud Networking (CloudNet)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125919318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bi-directional Flow Relocation For Computation Offloading With Multiple Network Functions","authors":"Koji Sugisono, S. Kawano, Akihiro Okada","doi":"10.1109/CloudNet51028.2020.9335798","DOIUrl":"https://doi.org/10.1109/CloudNet51028.2020.9335798","url":null,"abstract":"While relocation of bi-directional service flows in mobile services with computation offloading, round trip delay of in-flight service packets should be kept short to provide swift responses in the services. However, when a service flow passes through multiple network functions (NFs), the in-flight service packets incur additional transmission delay every time they arrive at the NFs moving the state for the flow. The additional delay depends on the timing when each NF transmits its state and the packets' direction. Then optimizing the delay extension of the packets headed for a direction does not contribute to suppressing the round trip time extension in the bi-directional service flow. We propose a scheduling method for moving the flow states that ensures that each in-flight packet waits at most once. The method executes the state migration for each NF sequentially and drains all the delayed packets before starting moving the next flow state. Numerical evaluation results show that our method effectively reduces the waiting times of the in-flight packets.","PeriodicalId":156419,"journal":{"name":"2020 IEEE 9th International Conference on Cloud Networking (CloudNet)","volume":"13 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114130831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ori Rottenstreich, A. Kulik, Ananya Joshi, J. Rexford, G. Rétvári, D. Menasché
{"title":"Cooperative Rule Caching for SDN Switches","authors":"Ori Rottenstreich, A. Kulik, Ananya Joshi, J. Rexford, G. Rétvári, D. Menasché","doi":"10.1109/CloudNet51028.2020.9335795","DOIUrl":"https://doi.org/10.1109/CloudNet51028.2020.9335795","url":null,"abstract":"Despite the tremendous success of SDNs in datacen-ters, their wide adoption still poses a key challenge: the packet-forwarding rules in switches require fast and power-hungry memories. Rule tables, which serve as caches, are of limited size in cheap and energy-constrained devices, motivating novel solutions to achieve high hit rates. In this paper, we leverage device connectivity in the fast data plane, where delays are in the order of few milliseconds, and propose multiple switches to work together to avoid accessing the control plane, where delays are orders of magnitude greater. As a low priority rule in a cache entails caching higher priority rules, we pose the problem of cooperative caching with dependencies. We provide models and algorithms for cooperative rule caching with dependencies, accounting for dependencies among rules implied by existing switch memory types. We develop caching algorithms for several typical use cases and study the difficulty to find an optimal cooperative rule placement as a function of the matching pattern, which lay the foundations of cooperative caching with dependencies.","PeriodicalId":156419,"journal":{"name":"2020 IEEE 9th International Conference on Cloud Networking (CloudNet)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124570720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aditya Dhakal, Sameer G. Kulkarni, K. Ramakrishnan
{"title":"ECML: Improving Efficiency of Machine Learning in Edge Clouds","authors":"Aditya Dhakal, Sameer G. Kulkarni, K. Ramakrishnan","doi":"10.1109/CloudNet51028.2020.9335804","DOIUrl":"https://doi.org/10.1109/CloudNet51028.2020.9335804","url":null,"abstract":"Edge cloud data centers (Edge) are deployed to provide responsive services to the end-users. Edge can host more powerful CPUs and DNN accelerators such as GPUs and may be used for offloading tasks from end-user devices that require more significant compute capabilities. But Edge resources may also be limited and must be shared across multiple applications that process requests concurrently from several clients. However, multiplexing GPUs across applications is challenging. With edge cloud servers needing to process a lot of streaming and the advent of multi-GPU systems, getting that data from the network to the GPU can be a bottleneck, limiting the amount of work the GPU cluster can do. The lack of prompt notification of job completion from the GPU can also result in poor GPU utilization. We build on our recent work on controlled spatial sharing of a single GPU to expand to support multi-GPU systems and propose a framework that addresses these challenges. Unlike the state-of-the-art uncontrolled spatial sharing currently available with systems such as CUDA-MPS, our controlled spatial sharing approach uses each of the GPU in the cluster efficiently by removing interference between applications, resulting in much better, predictable, inference latency We also use each of the cluster GPU's DMA engines to offload data transfers to the GPU complex, thereby preventing the CPU from being the bottleneck. Finally, our framework uses the CUDA event library to give timely, low overhead GPU notifications. Our evaluations show we can achieve low DNN inference latency and improve DNN inference throughput by at least a factor of 2.","PeriodicalId":156419,"journal":{"name":"2020 IEEE 9th International Conference on Cloud Networking (CloudNet)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121888234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virtualized Network Graph Design and Embedding Model","authors":"Takehiro Sato, T. Kurimoto, S. Urushidani, E. Oki","doi":"10.1109/CloudNet51028.2020.9335799","DOIUrl":"https://doi.org/10.1109/CloudNet51028.2020.9335799","url":null,"abstract":"This paper proposes an optimization model for virtualized network graph design and embedding (VNDE), which is applied for a scenario in which a single entity fulfills both roles of a service provider and an infrastructure provider. The VNDE model determines the number of virtual routers (VRs) and a virtual network (VN) graph for each VN request in conjunction with the VN embedding. Numerical results show that the proposed model designs VN graphs and embeds them into the substrate infrastructure according to the volume of traffic demands and access path cost.","PeriodicalId":156419,"journal":{"name":"2020 IEEE 9th International Conference on Cloud Networking (CloudNet)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132317209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fatih Yazıcı, Ayhan Sefa Yıldız, Alper Yazar, E. G. Schmidt
{"title":"A Novel Scalable On-chip Switch Architecture with Quality of Service Support for Hardware Accelerated Cloud Data Centers","authors":"Fatih Yazıcı, Ayhan Sefa Yıldız, Alper Yazar, E. G. Schmidt","doi":"10.1109/CloudNet51028.2020.9335788","DOIUrl":"https://doi.org/10.1109/CloudNet51028.2020.9335788","url":null,"abstract":"This paper proposes a scalable on-chip packet switch architecture, ACCLOUD-SWITCH, for hardware accelerated cloud data centers. The proposed switch architecture adopts architectural features from high-speed computer network and network on chip (NoC) routers. ACCLOUD-SWITCH interconnects heterogeneous high-speed interfaces and is implemented on FPGA. The switch fabric runs at line speed for scalability. We propose a new work-conserving fabric arbiter that can allocate bandwidth to input/output pairs by prioritizing the switch ports and a new hybrid buffer structure for ports connected to reconfigurable regions for more efficient memory use. The switch is implemented for Xilinx Zynq SoC device to work at 40 Gbps. Our simulation results demonstrate the benefits of the proposed arbiter and the hybrid buffer structure.","PeriodicalId":156419,"journal":{"name":"2020 IEEE 9th International Conference on Cloud Networking (CloudNet)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122638159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"POX-PLUS: An SDN Controller with Dynamic Shortest Path Routing","authors":"M. Alshammari, A. Rezgui","doi":"10.1109/CloudNet51028.2020.9335792","DOIUrl":"https://doi.org/10.1109/CloudNet51028.2020.9335792","url":null,"abstract":"Routing in SDNs exploits the controller's global view and computes paths either using a single-source shortest path algorithm (e.g., Dijkstra, Bellman-Ford) or an all-pairs shortest path (APSP) algorithm (e.g., Floyd-Warshall). Existing APSP routing algorithms for SDNs have substantial performance limitations in handling changes in the routes due to link deletion (failure) and link insertion (recovery). In this paper, we present POX-PLUS, a new SDN controller based on the popular POX controller. POX-PLUS includes a new routing module called DR-APSP (Dynamic Routing based on All Pairs Shortest Paths) that computes and efficiently maintains shortest paths between nodes in the SDN.","PeriodicalId":156419,"journal":{"name":"2020 IEEE 9th International Conference on Cloud Networking (CloudNet)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115424997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Optimization for Probabilistic Protection with Multiple Types of Resources in Cloud","authors":"Mitsuki Ito, Fujun He, E. Oki","doi":"10.1109/CloudNet51028.2020.9335810","DOIUrl":"https://doi.org/10.1109/CloudNet51028.2020.9335810","url":null,"abstract":"This paper proposes a robust optimization model for probabilistic protection with multiple types of resources to minimize the required backup capacity for each type of resource against multiple random failures of physical machines in a cloud provider. If random failures occur, the required capacities for virtual machines are allocated to the dedicated backup physical machines, which are determined in advance. Probabilistic protection restricts the probability that the workload caused by failures exceeds the backup capacity by a given survivability parameter. We introduce three survivability parameters for central processing unit (CPU), memory, and the entire cloud provider considering both CPU and memory. By using the relationship between the three survivability parameters, the proposed model guarantees probabilistic protection for each resource, CPU and memory, and the entire cloud provider. By adopting the robust optimization technique, we formulate the proposed model as a multi-objective mixed integer linear programming problem. To deal with the multi-objective optimization problem, we apply the lexicographic weighted Tchebycheff method with which a Pareto optimal solution is obtained. Our proposed model reduces the average value between the backup capacity ratios of CPU and memory compared with the conventional model.","PeriodicalId":156419,"journal":{"name":"2020 IEEE 9th International Conference on Cloud Networking (CloudNet)","volume":"478 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115216600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masahiro Asada, Ryota Kawashima, Hiroki Nakayama, Tsunemasa Hayashi, H. Matsuo
{"title":"Roadblocks of I/O Parallelization: Removing H/W Contentions by Static Role Assignment in VNFs","authors":"Masahiro Asada, Ryota Kawashima, Hiroki Nakayama, Tsunemasa Hayashi, H. Matsuo","doi":"10.1109/CloudNet51028.2020.9335803","DOIUrl":"https://doi.org/10.1109/CloudNet51028.2020.9335803","url":null,"abstract":"Achieving 100 Gbps+ throughput with commodity servers is a challenging goal, even with state-of-the-art Data Plane Development Kit (DPDK). Fundamental performance of CPU/Memory is now the bottleneck and simple code optimization of Network Functions (NFs) cannot be the solution. Hardware accelerators including FPGA are getting attentions for performance boost; however, relying on specific features degrades manageability of NFV-nodes. Common Receive Side Scaling (RSS) provides a means of H/W-level parallelization, but per-flow throughput is not accelerated. Existing software-based approaches distribute processing load of NFs, but I/O is still serialized for each datapath. We tackled I/O parallelization and uncovered encounterd certainly misty contentions in our previous study. Specifically, per-thread CPU cycle consumptions proportionally grew as increasing parallelization level, although the overhead of conceivable mutual executions (e.g. CAS operations) was trivial. In this paper, we pursue the cause of the issue and upgrade our I/O parallelization scheme. Our careful investigation of NFV-node internals ranging from application to device driver layers indicates that hidden H/W-level contentions involving DMA heavily consume CPU cycles. We propose a contention avoidance design of thread role assignment and prove our design can flatten per-thread CPU cycle consumptions.","PeriodicalId":156419,"journal":{"name":"2020 IEEE 9th International Conference on Cloud Networking (CloudNet)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128913351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting IoT Applications with Serverless Edge Clouds","authors":"I. Wang, Elizabeth Liri, K. Ramakrishnan","doi":"10.1109/CloudNet51028.2020.9335805","DOIUrl":"https://doi.org/10.1109/CloudNet51028.2020.9335805","url":null,"abstract":"Cloud computing has grown because of lowered costs due to economies of scale and multiplexing. Serverless computing exploits multiplexing in cloud computing however, for low latency required by IoT applications, the cloud should be moved nearer to the IoT device and the cold start problem should be addressed. Using a real-world dataset, we showed through implementation in an open-source cloud environment based on Knative that a serverless approach to manage IoT traffic is feasible, uses less resources than a serverfull approach and traffic prediction with prefetching can mitigate the cold start delay penalty. However applying the Knative framework directly to IoT traffic without considering the execution context gives unnecessary overhead.","PeriodicalId":156419,"journal":{"name":"2020 IEEE 9th International Conference on Cloud Networking (CloudNet)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130836019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}