{"title":"COCSN: A Multi-Tiered Cascaded Optical Circuit Switching Network for Data Center","authors":"Shuo Li;Huaxi Gu;Xiaoshan Yu;Hua Huang;Songyan Wang;Zeshan Chang","doi":"10.1109/TCC.2024.3488275","DOIUrl":"https://doi.org/10.1109/TCC.2024.3488275","url":null,"abstract":"A cascaded network represents a classic scaling-out model in traditional electrical switching networks. Recent proposals have integrated optical circuit switching at specific tiers of these networks to reduce power consumption and enhance topological flexibility. Utilizing a multi-tiered cascaded optical circuit switching network is expected to extend the advantages of optical circuit switching further. The main challenges fall into two categories. First, an architecture with sufficient connectivity is required to support varying workloads. Second, the network reconfiguration is more complex and necessitates a low-complexity scheduling algorithm. In this work, we propose COCSN, a multi-tiered cascaded optical circuit switching network architecture for data center. COCSN employs wavelength-selective switches that integrate multiple wavelengths to enhance network connectivity. We formulate a mathematical model covering lightpath establishment, network reconfiguration, and reconfiguration goals, and propose theorems to optimize the model. Based on the theorems, we introduce an over-subscription-supported wavelength-by-wavelength scheduling algorithm, facilitating agile establishment of lightpaths in COCSN tailored to communication demand. This algorithm effectively addresses scheduling complexities and mitigates the issue of lengthy WSS configuration times. Simulation studies investigate the impact of flow length, WSS reconfiguration time, and communication domain on COCSN, verifying its significantly lower complexity and superior performance over classical cascaded networks.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1463-1475"},"PeriodicalIF":5.3,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142798036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiyao Liu;Xuanzhang Liu;Xinliang Wei;Hongchang Gao;Yu Wang
{"title":"Group Formation and Sampling in Group-Based Hierarchical Federated Learning","authors":"Jiyao Liu;Xuanzhang Liu;Xinliang Wei;Hongchang Gao;Yu Wang","doi":"10.1109/TCC.2024.3482865","DOIUrl":"https://doi.org/10.1109/TCC.2024.3482865","url":null,"abstract":"Hierarchical federated learning has emerged as a pragmatic approach to addressing scalability, robustness, and privacy concerns within distributed machine learning, particularly in the context of edge computing. This hierarchical method involves grouping clients at the edge, where the constitution of client groups significantly impacts overall learning performance, influenced by both the benefits obtained and costs incurred during group operations (such as group formation and group training). This is especially true for edge and mobile devices, which are more sensitive to computation and communication overheads. The formation of groups is critical for group-based hierarchical federated learning but often neglected by researchers, especially in the realm of edge systems. In this paper, we present a comprehensive exploration of a group-based federated edge learning framework utilizing the hierarchical cloud-edge-client architecture and employing probabilistic group sampling. Our theoretical analysis of its convergence rate, considering the characteristics of client groups, reveals the pivotal role played by group heterogeneity in achieving convergence. Building on this insight, we introduce new methods for group formation and group sampling, aiming to mitigate data heterogeneity within groups and enhance the convergence and overall performance of federated learning. Our proposed methods are validated through extensive experiments, demonstrating their superiority over current algorithms in terms of prediction accuracy and training cost.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1433-1448"},"PeriodicalIF":5.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aggregate Monitoring for Geo-Distributed Kubernetes Cluster Federations","authors":"Chih-Kai Huang;Guillaume Pierre","doi":"10.1109/TCC.2024.3482574","DOIUrl":"https://doi.org/10.1109/TCC.2024.3482574","url":null,"abstract":"Distributed monitoring is an essential functionality to allow large cluster federations to efficiently schedule applications on a set of available geo-distributed resources. However, periodically reporting the precise status of each available server is both unnecessary to allow accurate scheduling and unscalable when the number of servers grows. This paper proposes Acala, an aggregate monitoring framework for geo-distributed Kubernetes cluster federations which aims to provide the management cluster with aggregated information about the entire cluster instead of individual servers. Based on actual deployment under a controlled environment in the geo-distributed Grid’5000 testbed, our evaluations show that Acala reduces the cross-cluster network traffic by up to 97% and the scrape duration by up to 55% in the single member cluster experiment. Our solution also decreases cross-cluster network traffic by 95% and memory resource consumption by 83% in multiple member cluster scenarios. A comparison of scheduling efficiency with and without data aggregation shows that aggregation has minimal effects on the system’s scheduling function. These results indicate that our approach is superior to the existing solution and is suitable to handle large-scale geo-distributed Kubernetes cluster federation environments.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1449-1462"},"PeriodicalIF":5.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pierre Olivier;A K M Fazla Mehrab;Sandeep Errabelly;Stefan Lankes;Mohamed Lamine Karaoui;Robert Lyerly;Sang-Hoon Kim;Antonio Barbalace;Binoy Ravindran
{"title":"HEXO: Offloading Long-Running Compute- and Memory-Intensive Workloads on Low-Cost, Low-Power Embedded Systems","authors":"Pierre Olivier;A K M Fazla Mehrab;Sandeep Errabelly;Stefan Lankes;Mohamed Lamine Karaoui;Robert Lyerly;Sang-Hoon Kim;Antonio Barbalace;Binoy Ravindran","doi":"10.1109/TCC.2024.3482178","DOIUrl":"https://doi.org/10.1109/TCC.2024.3482178","url":null,"abstract":"OS-capable embedded systems exhibiting a very low power consumption are available at an extremely low price point. It makes them highly compelling in a datacenter context. We show that sharing long-running, compute-intensive datacenter workloads between a server machine and one or a few connected embedded boards of negligible cost and power consumption can yield significant performance and energy benefits. Our approach, named Heterogeneous EXecution Offloading (HEXO), selectively offloads Virtual Machines (VMs) from server-class machines to embedded boards. Our design tackles several challenges. We address the Instruction Set Architecture (ISA) difference between typical servers (x86) and embedded systems (ARM) through hypervisor and guest OS-level support for heterogeneous-ISA runtime VM migration. We cope with the low amount of resources in embedded systems by using lightweight VMs – unikernels – and by using the server's free RAM as remote memory for embedded boards through a transparent lightweight memory disaggregation mechanism for heterogeneous server-embedded clusters, called Netswap. VMs are offloaded based on an estimation of the slowdown expected from running on a given board. We build a prototype of HEXO and demonstrate significant increases in throughput (up to 67%) and energy efficiency (up to 56%) using benchmarks representative of compute-intensive long-running workloads.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1415-1432"},"PeriodicalIF":5.3,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Offloading and Resource Allocation for Collaborative Cloud Computing With Dependent Subtask Scheduling on Multi-Core Server","authors":"Zihan Gao;Peixiao Zheng;Wanming Hao;Shouyi Yang","doi":"10.1109/TCC.2024.3481039","DOIUrl":"https://doi.org/10.1109/TCC.2024.3481039","url":null,"abstract":"Collaborative cloud computing (CCC) has emerged as a promising paradigm to support computation-intensive and delay-sensitive applications by leveraging MEC and MCC technologies. However, the coupling between multiple variables and subtask dependencies within an application poses significant challenges to the computation offloading mechanism. To address this, we investigate the computation offloading problem for CCC by jointly optimizing offloading decisions, resource allocation, and subtask scheduling across a multi-core edge server. First, we exploit latency to design a subtask dependency model within the application. Next, we formulate a System Energy-Time Cost (\u0000<inline-formula><tex-math>$SETC$</tex-math></inline-formula>\u0000) minimization problem that considers the trade-off between time and energy consumption while satisfying subtask dependencies. Due to the complexity of directly solving the formulated problem, we decompose it and propose two offloading algorithms, namely Maximum Local Searching Offloading (MLSO) and Sequential Searching Offloading (SSO), to jointly optimize offloading decisions and resource allocation. We then model dependent subtask scheduling across the multi-core edge server as a Job-Shop Scheduling Problem (JSSP) and propose a Genetic-based Task Scheduling (GTS) algorithm to achieve optimal dependent subtask scheduling on the multi-core edge server. Finally, our simulation results demonstrate the effectiveness of the proposed MLSO, SSO, and GTS algorithms under different parameter settings.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1401-1414"},"PeriodicalIF":5.3,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142798034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangyuan Xing;Fei Tong;Jialong Yang;Guang Cheng;Shibo He
{"title":"RAM: A Resource-Aware DDoS Attack Mitigation Framework in Clouds","authors":"Fangyuan Xing;Fei Tong;Jialong Yang;Guang Cheng;Shibo He","doi":"10.1109/TCC.2024.3480194","DOIUrl":"https://doi.org/10.1109/TCC.2024.3480194","url":null,"abstract":"Distributed Denial of Service (DDoS) attacks threaten cloud servers by flooding redundant requests, leading to system resource exhaustion and legitimate service shutdown. Existing DDoS attack mitigation mechanisms mainly rely on resource expansion, which may result in unexpected resource over-provisioning and accordingly increase cloud system costs. To effectively mitigate DDoS attacks without consuming extra resources, the main challenges lie in the compromisesbetween incoming requests and available cloud resources. This paper proposes a resource-aware DDoS attack mitigation framework named RAM, where the mechanism of feedback in control theory is employed to adaptively adjust the interaction between incoming requests and available cloud resources. Specifically, two indicators including request confidence level and maximum cloud workload are designed. In terms of these two indicators, the incoming requests will be classified using proportional-integral-derivative (PID) feedback control-based classification scheme with request determination adaptation. The incoming requests can be subsequently processed according to their confidence levels as well as the workload and available resources of cloud servers, which achieves an effective resource-aware mitigation of DDoS attacks. Extensive experiments have been conducted to verify the effectiveness of RAM, which demonstrate that the proposed RAM can improve the request classification performance and guarantee the quality of service.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1387-1400"},"PeriodicalIF":5.3,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142798035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimizing Response Delay in UAV-Assisted Mobile Edge Computing by Joint UAV Deployment and Computation Offloading","authors":"Jianshan Zhang;Haibo Luo;Xing Chen;Hong Shen;Longkun Guo","doi":"10.1109/TCC.2024.3478172","DOIUrl":"https://doi.org/10.1109/TCC.2024.3478172","url":null,"abstract":"As a promising technique for offloading computation tasks from mobile devices, Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC) utilizes UAVs as computational resources. A popular method for enhancing the quality of service (QoS) of UAV-assisted MEC systems is to jointly optimize UAV deployment and computation task offloading. This imposes the challenge of dynamically adjusting UAV deployment and computation offloading to accommodate the changing positions and computational requirements of mobile devices. Due to the real-time requirements of MEC computation tasks, finding an efficient joint optimization approach is imperative. This paper proposes an algorithm aimed at minimizing the average response delay in a UAV-assisted MEC system. The approach revolves around the joint optimization of UAV deployment and computation offloading through convex optimization. We break down the problem into three sub-problems: UAV deployment, Ground Device (GD) access, and computation tasks offloading, which we address using the block coordinate descent algorithm. Observing the \u0000<inline-formula><tex-math>$NP$</tex-math></inline-formula>\u0000-hardness nature of the original problem, we present near-optimal solutions to the decomposed sub-problems. Simulation results demonstrate that our approach can generate a joint optimization solution within seconds and diminish the average response delay compared to state-of-the-art algorithms and other advanced algorithms, with improvements ranging from 4.70% to 42.94%.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1372-1386"},"PeriodicalIF":5.3,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CloudBrain-ReconAI: A Cloud Computing Platform for MRI Reconstruction and Radiologists’ Image Quality Evaluation","authors":"Yirong Zhou;Chen Qian;Jiayu Li;Zi Wang;Yu Hu;Biao Qu;Liuhong Zhu;Jianjun Zhou;Taishan Kang;Jianzhong Lin;Qing Hong;Jiyang Dong;Di Guo;Xiaobo Qu","doi":"10.1109/TCC.2024.3476418","DOIUrl":"https://doi.org/10.1109/TCC.2024.3476418","url":null,"abstract":"Efficient collaboration between engineers and radiologists is important for image reconstruction algorithm development and image quality evaluation in magnetic resonance imaging (MRI). Here, we develop CloudBrain-ReconAI, an online cloud computing platform, for algorithm deployment, fast and blind reader study. This platform supports online image reconstruction using state-of-the-art artificial intelligence and compressed sensing algorithms with applications for fast imaging (Cartesian and non-Cartesian sampling) and high-resolution diffusion imaging. Through visiting the website, radiologists can easily score and mark images. Then, automatic statistical analysis will be provided.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1359-1371"},"PeriodicalIF":5.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142798037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aditya Dhakal;Sameer G. Kulkarni;K. K. Ramakrishnan
{"title":"D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs","authors":"Aditya Dhakal;Sameer G. Kulkarni;K. K. Ramakrishnan","doi":"10.1109/TCC.2024.3476210","DOIUrl":"https://doi.org/10.1109/TCC.2024.3476210","url":null,"abstract":"Hardware accelerators such as GPUs are required for real-time, low latency inference with Deep Neural Networks (DNN). Providing inference services in the cloud can be resource intensive, and effectively utilizing accelerators in the cloud is important. Spatial multiplexing of the GPU, while limiting the GPU resources (GPU%) to each DNN to the right amount, leads to higher GPU utilization and higher inference throughput. Right-sizing the GPU for each DNN the optimal batching of requests to balance throughput and service level objectives (SLOs), and maximizing throughput by appropriately scheduling DNNs are still significant challenges.This article introduces a dynamic and fair spatio-temporal scheduler (D-STACK) for multiple DNNs to run in the GPU concurrently. We develop and validate a model that estimates the parallelism each DNN can utilize and a lightweight optimization formulation to find an efficient batch size for each DNN. Our holistic inference framework provides high throughput while meeting application SLOs. We compare D-STACK with other GPU multiplexing and scheduling methods (e.g., NVIDIA Triton, Clipper, Nexus), using popular DNN models. Our controlled experiments with multiplexing several popular DNN models achieve up to \u0000<inline-formula><tex-math>$1.6times$</tex-math></inline-formula>\u0000 improvement in GPU utilization and up to \u0000<inline-formula><tex-math>$4times$</tex-math></inline-formula>\u0000 improvement in inference throughput.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1344-1358"},"PeriodicalIF":5.3,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FaaSCtrl: A Comprehensive-Latency Controller for Serverless Platforms","authors":"Abhisek Panda;Smruti R. Sarangi","doi":"10.1109/TCC.2024.3473015","DOIUrl":"https://doi.org/10.1109/TCC.2024.3473015","url":null,"abstract":"Serverless computing systems have become very popular because of their natural advantages with respect to auto-scaling, load balancing and fast distributed processing. As of today, almost all serverless systems define two QoS classes: best-effort (\u0000<inline-formula><tex-math>$BE$</tex-math></inline-formula>\u0000) and latency-sensitive (\u0000<inline-formula><tex-math>$LS$</tex-math></inline-formula>\u0000). Systems typically do not offer any latency or QoS guarantees for \u0000<inline-formula><tex-math>$BE$</tex-math></inline-formula>\u0000 jobs and run them on a best-effort basis. In contrast, systems strive to minimize the processing time for \u0000<inline-formula><tex-math>$LS$</tex-math></inline-formula>\u0000 jobs. This work proposes a precise definition for these job classes and argues that we need to consider a bouquet of performance metrics for serverless applications, not just a single one. We thus propose the comprehensive latency (\u0000<inline-formula><tex-math>$CL$</tex-math></inline-formula>\u0000) that comprises the mean, tail latency, median and standard deviation of a series of invocations for a given serverless function. Next, we design a system \u0000<i>FaaSCtrl</i>\u0000, whose main objective is to ensure that every component of the \u0000<inline-formula><tex-math>$CL$</tex-math></inline-formula>\u0000 is within a prespecified limit for an LS application, and for BE applications, these components are minimized on a best-effort basis. Given the sheer complexity of the scheduling problem in a large multi-application setup, we use the method of surrogate functions in optimization theory to design a simpler optimization problem that relies on performance and fairness. We rigorously establish the relevance of these metrics through characterization studies. Instead of using standard approaches based on optimization theory, we use a much faster reinforcement learning (RL) based approach to tune the knobs that govern process scheduling in Linux, namely the real-time priority and the assigned number of cores. RL works well in this scenario because the benefit of a given optimization is probabilistic in nature, owing to the inherent complexity of the system. We show using rigorous experiments on a set of real-world workloads that \u0000<i>FaaSCtrl</i>\u0000 achieves its objectives for both LS and BE applications and outperforms the state-of-the-art by 36.9% (for tail response latency) and 44.6% (for response latency's std. dev.) for LS applications.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1328-1343"},"PeriodicalIF":5.3,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}