Jiawei Tan;Zhuo Tang;Wentong Cai;Wen Jun Tan;Xiong Xiao;Jiapeng Zhang;Yi Gao;Kenli Li
{"title":"A Cost-Aware Operator Migration Approach for Distributed Stream Processing System","authors":"Jiawei Tan;Zhuo Tang;Wentong Cai;Wen Jun Tan;Xiong Xiao;Jiapeng Zhang;Yi Gao;Kenli Li","doi":"10.1109/TCC.2025.3538512","DOIUrl":"https://doi.org/10.1109/TCC.2025.3538512","url":null,"abstract":"Stream processing is integral to edge computing due to its low-latency attributes. Nevertheless, variability in user group sizes and disparate computing capabilities of edge devices necessitate frequent operator migrations within the stream. Moreover, intricate dependencies among stream operators often obscure the detection of potential bottleneck operators until an identified bottleneck is migrated in the stream. To address this, we propose a Cost-Aware Operator Migration (CAOM) scheme. The CAOM scheme incorporates a bottleneck operator detection mechanism that directly identifies all bottleneck operators based on task running metrics. This approach avoids multiple consecutive operator migrations in complex tasks, reducing the number of task interruptions caused by operator migration. Moreover, CAOM takes into account the temporal variance in operator migration costs. By factoring in the fluctuating data generation rate from data sources at different time intervals, CAOM selects the optimal start time for operator migration to minimize the amount of accumulated data during task interruptions. Finally, we implemented CAOM on Apache Flink and evaluated its performance using the WordCount and Nexmark applications. Our experiments show that CAOM effectively reduces the number of necessary operator migrations in tasks with complex topologies and decreases the latency overhead associated with operator migration compared to state-of-the-art schemes.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"441-454"},"PeriodicalIF":5.3,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143580855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenli He;Ying Guo;Xiaolong Zhai;Mingxiong Zhao;Wei Zhou;Keqin Li
{"title":"Joint Computation Offloading and Resource Allocation in Mobile-Edge Cloud Computing: A Two-Layer Game Approach","authors":"Zhenli He;Ying Guo;Xiaolong Zhai;Mingxiong Zhao;Wei Zhou;Keqin Li","doi":"10.1109/TCC.2025.3538090","DOIUrl":"https://doi.org/10.1109/TCC.2025.3538090","url":null,"abstract":"Mobile-Edge Cloud Computing (MECC) plays a crucial role in balancing low-latency services at the edge with the computational capabilities of cloud data centers (DCs). However, many existing studies focus on single-provider settings or limit their analysis to interactions between mobile devices (MDs) and edge servers (ESs), often overlooking the competition that occurs among ESs from different providers. This article introduces an innovative two-layer game framework that captures independent self-interested competition among MDs and ESs, providing a more accurate reflection of multi-vendor environments. Additionally, the framework explores the influence of cloud-edge collaboration on ES competition, offering new insights into these dynamics. The proposed model extends previous research by developing algorithms that optimize task offloading and resource allocation strategies for both MDs and ESs, ensuring the convergence to Nash equilibrium in both layers. Simulation results demonstrate the potential of the framework to improve resource efficiency and system responsiveness in multi-provider MECC environments.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"411-428"},"PeriodicalIF":5.3,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143580874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Developments on the “Machine Learning as a Service for High Energy Physics” Framework and Related Cloud Native Solution","authors":"Luca Giommi;Daniele Spiga;Mattia Paladino;Valentin Kuznetsov;Daniele Bonacorsi","doi":"10.1109/TCC.2025.3535793","DOIUrl":"https://doi.org/10.1109/TCC.2025.3535793","url":null,"abstract":"Machine Learning (ML) techniques have been successfully used in many areas of High Energy Physics (HEP) and will play a significant role in the success of upcoming High-Luminosity Large Hadron Collider (HL-LHC) program at CERN. An unprecedented amount of data at the exascale will be collected by LHC experiments in the next decade, and this effort will require novel approaches to train and use ML models. The work presented in this paper is focused on the developments of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTPs calls. These pipelines are executed by using MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. In particular, new features implemented on the framework will be presented as well as updates on the architecture of an existing prototype of the MLaaS4HEP cloud service will be provided. This solution includes two OAuth2 proxy servers as authentication/authorization layer, a MLaaS4HEP server, an XRootD proxy server for enabling access to remote ROOT data, and the TensorFlow as a Service (TFaaS) service in charge of the inference phase.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"429-440"},"PeriodicalIF":5.3,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143580872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Verifiable Encrypted Image Retrieval With Reversible Data Hiding in Cloud Environment","authors":"Mingyue Li;Yuting Zhu;Ruizhong Du;Chunfu Jia","doi":"10.1109/TCC.2025.3535937","DOIUrl":"https://doi.org/10.1109/TCC.2025.3535937","url":null,"abstract":"With growing numbers of users outsourcing images to cloud servers, privacy-preserving content-based image retrieval (CBIR) is widely studied. However, existing privacy-preserving CBIR schemes have limitations in terms of low search accuracy and efficiency due to the use of unreasonable index structures or retrieval methods. Meanwhile, existing result verification schemes do not consider the privacy of verification information. To address these problems, we propose a new secure verification encrypted image retrieval scheme. Specifically, we design an additional homomorphic bitmap index structure by using a pre-trained CNN model with modified fully connected layers to extract image feature vectors and organize them into a bitmap. It makes the extracted features more representative and robust compared to manually designed features, and only performs vector addition during the search process, improving search efficiency and accuracy. Moreover, we design a reversible data hiding (RDH) technique with color images, which embeds the verification information into the least significant bits of the encrypted image pixels to improve the security of the verification information. Finally, we analyze the security of our scheme against chosen-plaintext attacks (CPA) in the security analysis and demonstrate the effectiveness of our scheme on two real-world datasets (i.e., COCO and Flickr-25 k) through experiments.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"397-410"},"PeriodicalIF":5.3,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143580871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PiCoP: Service Mesh for Sharing Microservices in Multiple Environments Using Protocol-Independent Context Propagation","authors":"Hiroya Onoe;Daisuke Kotani;Yasuo Okabe","doi":"10.1109/TCC.2025.3531954","DOIUrl":"https://doi.org/10.1109/TCC.2025.3531954","url":null,"abstract":"Continuous integration and continuous delivery require many production-like environments in a cluster for testing, staging, debugging, and previewing. In applications built on microservice architecture, sharing common microservices in multiple environments is an effective way to reduce resource consumption. Previous methods extend application layer protocols like HTTP and gRPC to propagate contexts including environment identifiers and to route requests. However, microservices also use other protocols such as MySQL, Redis, Memcached, and AMQP, and extending each protocol requires lots of effort to implement the extensions. This paper proposes PiCoP, a framework to share microservices in multiple environments by propagating contexts and routing requests independently of application layer protocols. PiCoP provides a protocol that propagates contexts by appending them to the front of each TCP byte stream and constructs a service mesh that uses the protocol to route requests. We design the protocol to make it easy to instrument into a system. We demonstrate that PiCoP can reduce resource usage and that it applies to a real-world application, enabling the sharing of microservices in multiple environments using any application layer protocol.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"383-396"},"PeriodicalIF":5.3,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143580854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Su;Wenhao Fan;Qingcheng Meng;Penghui Chen;Yuan'an Liu
{"title":"Joint Adaptive Aggregation and Resource Allocation for Hierarchical Federated Learning Systems Based on Edge-Cloud Collaboration","authors":"Yi Su;Wenhao Fan;Qingcheng Meng;Penghui Chen;Yuan'an Liu","doi":"10.1109/TCC.2025.3530681","DOIUrl":"https://doi.org/10.1109/TCC.2025.3530681","url":null,"abstract":"Hierarchical federated learning shows excellent potential for communication-computation trade-offs and reliable data privacy protection by introducing edge-cloud collaboration. Considering non-independent and identically distributed data distribution among devices and edges, this article aims to minimize the final loss function under time and energy budget constraints by optimizing the aggregation frequency and resource allocation jointly. Although there is no closed-form expression relating the final loss function to optimization variables, we divide the hierarchical federated learning process into multiple cloud intervals and analyze the convergence bound for each cloud interval. Then, we transform the initial problem into one that can be adaptively optimized in each cloud interval. We propose an adaptive hierarchical federated learning process, termed as AHFLP, where we determine edge and cloud aggregation frequency for each cloud interval based on estimated parameters, and then the CPU frequency of devices and wireless channel bandwidth allocation can be optimized in each edge. Simulations are conducted under different models, datasets and data distributions, and the results demonstrate the superiority of our proposed AHFLP compared with existing schemes.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"369-382"},"PeriodicalIF":5.3,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-Aware Offloading of Containerized Tasks in Cloud Native V2X Networks","authors":"Estela Carmona-Cejudo;Francesco Iadanza","doi":"10.1109/TCC.2025.3529245","DOIUrl":"https://doi.org/10.1109/TCC.2025.3529245","url":null,"abstract":"In cloud-native environments, executing vehicle-to-everything (V2X) tasks in edge nodes close to users significantly reduces service end-to-end latency. Containerization further reduces resource and time consumption, and, subsequently, application latency. Since edge nodes are typically resource and energy-constrained, optimizing offloading decisions and managing edge energy consumption is crucial. However, the offloading of containerized tasks has not been thoroughly explored from a practical implementation perspective. This paper proposes an optimization framework for energy-aware offloading of V2X tasks implemented as Kubernetes pods. A weighted utility function is derived based on cumulative pod response time, and an edge-to-cloud offloading decision algorithm (ECODA) is proposed. The system's energy cost model is derived, and a closed-loop repeated reward-based mechanism for CPU adjustment is presented. An energy-aware (EA)-ECODA is proposed to solve the offloading optimization problem while adjusting CPU usage according to energy considerations. Simulations show that ECODA and EA-ECODA outperform first-in, first-served (FIFS) and EA-FIFS in terms of utility, average pod response time, and resource usage, with low computational complexity. Additionally, a real testbed evaluation of a vulnerable road user application demonstrates that ECODA outperforms Kubernetes vertical scaling in terms of service-level delay. Moreover, EA-ECODA significantly improves energy usage utility.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"336-350"},"PeriodicalIF":5.3,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Serverless Platform for Smart Deployment of Service Function Chains","authors":"Sheshadri K R;J. Lakshmi","doi":"10.1109/TCC.2025.3528573","DOIUrl":"https://doi.org/10.1109/TCC.2025.3528573","url":null,"abstract":"Cloud Data Centres deal with dynamic changes all the time. Networks in particular, need to adapt their configurations to changing workloads. Given these expectations, Network Function Virtualization (NFV) using Software Defined Networks (SDNs) has realized the aspect of programmability in networks. NFVs allow network services to be programmed as software entities that can be deployed on commodity clusters in the Cloud. Being software, they inherently carry the ability to be customized to specific tenants’ requirements and thus support multi-tenant variations with ease. However, the ability to exploit scaling in alignment with changing demands with minimal loss of service, and improving resource usage efficiency still remains a challenge. Several recent works in literature have proposed platforms to realize Virtual Network functions (VNFs) on the Cloud using service offerings such as Infrastructure as a Service (IaaS) and serverless computing. These approaches are limited by deployment difficulties (configuration and sizing), adaptability to performance requirements (elastic scaling), and changing workload dynamics (scaling and customization). In the current work, we propose a Hybrid Serverless Platform (HSP) to address these identified lacunae. The HSP is implemented using a combination of persistent IaaS, and FaaS components. The IaaS components handle the steady state load, whereas the FaaS components activate during the dynamic change associated with scaling to minimize service loss. The HSP controller takes provisioning decisions based on Quality of Service (QoS) rules and flow statistics using an auto recommender, alleviating users of sizing decisions for function deployment. HSP controller design exploits data locality in SFC realization, reducing data-transfer times between VNFs. It also enables the usage of application characteristics to offer higher control over SFC deployment. A proof-of-concept realization of HSP is presented in the paper and is evaluated for a representative Service Function Chain (SFC) for a dynamic workload, which shows minimal loss in flowlet service, up to 35% resource savings as compared to a pure IaaS deployment and up to 55% lower end-to-end times as compared to a baseline FaaS implementation.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"351-368"},"PeriodicalIF":5.3,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CARL: Cost-Optimized Online Container Placement on VMs Using Adversarial Reinforcement Learning","authors":"Prathamesh Saraf Vinayak;Saswat Subhajyoti Mallick;Lakshmi Jagarlamudi;Anirban Chakraborty;Yogesh Simmhan","doi":"10.1109/TCC.2025.3528446","DOIUrl":"https://doi.org/10.1109/TCC.2025.3528446","url":null,"abstract":"Containerization has become popular for the deployment of applications on public clouds. Large enterprises may host 100 s of applications on 1000 s containers that are placed onto Virtual Machines (VMs). Such placement decisions happen continuously as applications are updated by DevOps pipelines that deploy the containers. Managing the placement of container resource requests onto the available capacities of VMs needs to be cost-efficient. This is well-studied, and usually modelled as a multi-dimensional Vector Bin-packing Problem (VBP). Many heuristics, and recently machine learning approaches, have been developed to solve this NP-hard problem for real-time decisions. We propose CARL, a novel approach to solve VBP through Adversarial Reinforcement Learning (RL) for cost minimization. It mimics the placement behavior of an offline semi-optimal VBP solver (teacher), while automatically learning a reward function for reducing the VM costs which out-performs the teacher. It requires limited historical container workload traces to train, and is resilient to changes in the workload distribution during inferencing. We extensively evaluate CARL on workloads derived from realistic traces from Google and Alibaba for the placement of 5 k–10 k container requests onto 2 k–8 k VMs, and compare it with classic heuristics and state-of-the-art RL methods. (1) CARL is <i>fast</i>, e.g., making placement decisions at <inline-formula><tex-math>$approx 1900$</tex-math></inline-formula> requests/sec onto 8,900 candidate VMs. (2) It is <i>efficient</i>, achieving <inline-formula><tex-math>$approx 16%$</tex-math></inline-formula> lower VM costs than classic and contemporary RL methods. (3) It is <i>robust</i> to changes in the workload, offering competitive results even when the resource needs or inter-arrival time of the container requests skew from the training workload.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"321-335"},"PeriodicalIF":5.3,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}