{"title":"Delay-Sensitive Task Offloading Optimization by Geometric Programming","authors":"Mohammad Fathi;Mohammad Saroughi;Azarhedi Zareie","doi":"10.1109/TCC.2024.3406384","DOIUrl":"10.1109/TCC.2024.3406384","url":null,"abstract":"Mobile cloud computing is an emerging technology to address the resource limitation of mobile terminals. These terminals need to satisfy the performance requirements of emerging resource-consuming applications. Among these applications, delay-sensitive applications are becoming popular with the requirements of low execution times. Satisfying the delay requirements of these applications is the main objective in the task offloading of mobile cloud computing. In this paper, considering a network of wireless and wired infrastructures, a resource allocation problem in the form of a non-convex problem is formulated to provide a fair delay for offloaded tasks by delay-sensitive applications. Both transmission and computation delays are included in the formulation of the offloading delay. To tackle the problem's complexity, the assignment of mobile terminals to radio access networks and cloud servers is done by proposing greedy assignment solutions. The derived problem which is a geometric programming problem is then solved using convex programming. The performance of the proposed solution is evaluated versus the number of mobile terminals with different values of bandwidth resources at the radio network, workloads, and demand CPU cycles at mobile terminals. Numerical results demonstrate the effectiveness of the proposed solution to decrease the offloading delay in comparison with similar schemes.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 3","pages":"889-896"},"PeriodicalIF":5.3,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141192560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Data Locality of Tasks by Executor Allocation in Spark Computing Environment","authors":"Zhongming Fu;Mengsi He;Yang Yi;Zhuo Tang","doi":"10.1109/TCC.2024.3406041","DOIUrl":"10.1109/TCC.2024.3406041","url":null,"abstract":"The concept of data locality is crucial for distributed systems (e.g., Spark and Hadoop) to process Big Data. Most of the existing research optimized the data locality from the aspect of task scheduling. However, as the execution container of Spark's tasks, the executor launched on different nodes can directly affect the data locality achieved by the tasks. This article tries to improve the data locality of tasks by executor allocation in Spark framework. First, because of different communication modes at stages, we separately model the communication cost of tasks for transferring input data to the executors. Then formalize an optimal executor allocation problem to minimize the total communication cost of transferring all input data. This problem is proven to be NP-hard. Finally, we present a greed dropping heuristic algorithm to provide solution to the executor allocation problem. Our proposals are implemented in Spark-3.4.0 and its performance is evaluated through representative micro-benchmarks (i.e., \u0000<italic>WordCount</i>\u0000, \u0000<italic>Join</i>\u0000, \u0000<italic>Sort</i>\u0000) and macro-benchmarks (i.e., \u0000<italic>PageRank</i>\u0000 and \u0000<italic>LDA</i>\u0000). Extensive experiments show that the proposed executor allocation strategy can decrease the network traffic and data access time by improving the data locality during the task scheduling. Its performance benefits are particularly significant for iterative applications.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 3","pages":"876-888"},"PeriodicalIF":5.3,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141192515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Task Offloading in Edge Computing Based on Dependency-Aware Reinforcement Learning","authors":"Xiangchun Chen;Jiannong Cao;Yuvraj Sahni;Shan Jiang;Zhixuan Liang","doi":"10.1109/TCC.2024.3381646","DOIUrl":"10.1109/TCC.2024.3381646","url":null,"abstract":"Collaborative edge computing (CEC) is an emerging computing paradigm in which edge nodes collaborate to perform tasks from end devices. Task offloading decides when and at which edge node tasks are executed. Most existing studies assume task profiles and network conditions are known in advance, which can hardly adapt to dynamic real-world computation environments. Some learning-based methods use online task offloading without considering task dependency and network flow scheduling, leading to underutilized resources and flow congestion. We study Online Dependent Task Offloading (ODTO) in CEC, jointly optimizing network flow scheduling to optimize quality of service by reducing task completion time and energy consumption. The challenge of ODTO lies in how to offload dependent tasks and schedule network flows in dynamic networks. We model ODTO as the Markov Decision Process (MDP) and propose an Asynchronous Deep Progressive Reinforcement Learning (ADPRL) approach that optimize offloading and bandwidth decisions. We design a novel dependency-aware reward mechanism to address task dependency and dynamic network. Extensive experiments on the Alibaba cluster trace dataset and synthetic dataset indicate that our algorithm outperforms heuristic and learning-based methods in average task completion time and energy consumption.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"594-608"},"PeriodicalIF":6.5,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140314864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinglong Diao;Huaxi Gu;Wenting Wei;Guoyong Jiang;Baochun Li
{"title":"Deep Reinforcement Learning Based Dynamic Flowlet Switching for DCN","authors":"Xinglong Diao;Huaxi Gu;Wenting Wei;Guoyong Jiang;Baochun Li","doi":"10.1109/TCC.2024.3382132","DOIUrl":"10.1109/TCC.2024.3382132","url":null,"abstract":"Flowlet switching has been proven to be an effective technology for fine-grained load balancing in data center networks. However, flowlet detection based on static flowlet timeout values, lacks accuracy and effectiveness in complex network environments. In this article, we propose a new deep reinforcement learning approach, called DRLet, to dynamically detect flowlets. DRLet offers two advantages: first, it provides dynamic flowlet timeout values to detect bursts into fine-grained flowlets; second, flowlet timeout values are automatically configured by the deep reinforcement learning agent, which only requires simple and measurable network states, instead of any prior knowledge, to achieve the pre-defined goal. With our approach, the flowlet timeout value dynamically matches the network load scenario, ensuring the accuracy and effectiveness of flowlet detection while suppressing packet reordering. Our results show that DRLet achieves superior performance compared to existing schemes based on static flowlet timeout values in both baseline and asymmetric topologies.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"580-593"},"PeriodicalIF":6.5,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140314972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Makespan and Security-Aware Workflow Scheduling for Cloud Service Cost Minimization","authors":"Liying Li;Chengliang Zhou;Peijin Cong;Yufan Shen;Junlong Zhou;Tongquan Wei","doi":"10.1109/TCC.2024.3382351","DOIUrl":"10.1109/TCC.2024.3382351","url":null,"abstract":"The market penetration of Infrastructure-as-a-Service (IaaS) in cloud computing is increasing benefiting from its flexibility and scalability. One of the most important issues for IaaS cloud service providers is to minimize the monetary cost while meeting cloud user experience requirements such as makespan and security. Prior works on cloud service cost minimization ignore either security or makespan which is very important for user experience. In this article, we propose a two-stage algorithm to solve the cloud service cost minimization problem at the premise of satisfying the security and makespan requirements of cloud users. Specifically, in the first stage, we propose a novel security service selection scheme to ensure system security by judiciously selecting security services with low cost for tasks under the constraints of time and security. In the second stage, to further reduce the cloud service cost, we design a workflow scheduling method based on an improved firefly algorithm (IFA). The IFA-based method schedules cloud service workflows to virtual machines of small cost at the premise of guaranteeing security and makespan. It can quickly find the workflow scheduling solution with minimized cost using our designed updating scheme and mapping operator. Extensive simulations are conducted on real-world workflows to verify the efficacy of the proposed two-stage method. Simulation results show that the proposed two-stage method outperforms the baseline and two benchmarking methods in terms of cost minimization without violating security and time constraints. Compared to benchmarking methods, the cloud service cost can be reduced by up to 57.6% by using our proposed approach.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"609-624"},"PeriodicalIF":6.5,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140314801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Live Migration of Virtual Machines Based on Dirty Page Similarity","authors":"Yucong Chen;Shuaixin Xu;Hubin Yang;Rui Zhou;Deke Guo;Qingguo Zhou","doi":"10.1109/TCC.2024.3379494","DOIUrl":"10.1109/TCC.2024.3379494","url":null,"abstract":"Pre-copy-based Virtual Machine (VM) live migration seamlessly migrates the running VM to the target physical server by pre-copying memory pages and realizing updates through loop iterations. This method, which has high reliability and robustness, can effectively achieve load balancing and reduce energy consumption. It is widely used in the industry to manage server cluster resources. However, it also involves many problems, such as many dirty memory pages resulting from repeated transmission and convergence failure of iterative transmission. Hence, pre-copy live migration cannot efficiently allocate server cluster resources. To resolve these problems, a VM pre-copy live migration technology based on the similarity of dirty memory pages is proposed in this paper. The access priority of historical dirty memory pages was determined by calculating the similarity weight based on the Hamming distance. A priority-based delay transmission scheme for high dirty pages and low dirty pages was used to decrease the frequent transmission of high dirty memory pages, increase the convergence speed of the live-migration iterative copy process, and reduce the overall migration time of VMs. A comparative analysis of experimental results based on six dimensions showed that the proposed method achieved better migration efficiency than the conventional live migration strategy.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"563-579"},"PeriodicalIF":6.5,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140200145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Myoungsung You;Minjae Seo;Jaehan Kim;Seungwon Shin;Jaehyun Nam
{"title":"Hyperion: Hardware-Based High-Performance and Secure System for Container Networks","authors":"Myoungsung You;Minjae Seo;Jaehan Kim;Seungwon Shin;Jaehyun Nam","doi":"10.1109/TCC.2024.3403175","DOIUrl":"10.1109/TCC.2024.3403175","url":null,"abstract":"Containers have become the predominant virtualization technique for deploying microservices in cloud environments. However, container networking, critical for microservice functionality, often introduces significant overhead and resource consumption, potentially degrading the performance of microservices. This challenge arises from the complexity of the software-based network data plane, responsible for network virtualization and access control within container traffic. To tackle this challenge, we propose \u0000<monospace>Hyperion</monospace>\u0000, a novel hardware-based container networking system that prioritizes high performance and security. Leveraging smartNICs, commonly found in cloud environments, \u0000<monospace>Hyperion</monospace>\u0000 implements a fully-functional container network data plane, encompassing network virtualization and access control. It also has the capability to dynamically optimize its data plane for agile responses to frequent changes in container environments, ensuring up-to-date data plane operation. This hardware-based design empowers \u0000<monospace>Hyperion</monospace>\u0000 to significantly improve the overall container networking performance without relying on the host system resources. Notably, \u0000<monospace>Hyperion</monospace>\u0000 seamlessly integrates with existing containerized applications without necessitating modifications. Our evaluation shows that compared to state-of-the-art solutions, \u0000<monospace>Hyperion</monospace>\u0000 achieves significant improvements in HTTP container communication latency and throughput by up to 2.25x and 4.3x, respectively. Furthermore, it reduces CPU utilization associated with container networking by up to 4x.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 3","pages":"844-858"},"PeriodicalIF":5.3,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141151312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junfei Wang;Jing Li;Zhen Gao;Zhu Han;Chao Qiu;Xiaofei Wang
{"title":"Game-Based Low Complexity and Near Optimal Task Offloading for Mobile Blockchain Systems","authors":"Junfei Wang;Jing Li;Zhen Gao;Zhu Han;Chao Qiu;Xiaofei Wang","doi":"10.1109/TCC.2024.3376394","DOIUrl":"10.1109/TCC.2024.3376394","url":null,"abstract":"The Internet of Things (IoT) finds applications across diverse fields but grapples with privacy and security concerns. Blockchain offers a remedy by instilling trust among IoT devices. The development of blockchain in IoT encounters hurdles due to its resource-intensive computation processing, notably in PoW-based systems. Cloud and edge computing can facilitate the application of blockchain in this environment, and the IoT users who want to mine in blockchain need to pay the computation resource rent to the Cloud Computing Service Provider (CCSP) for offloading the mining workload. In this scenario, these IoT miners can form groups to trade with CCSP to maximize their utility. In this paper, a mixed model of the Stackelberg game and coalition formation game is embraced to address the grouping and pricing issues between IoT miners and CCSP. In particular, the Stackelberg game is utilized to handle the pricing problem, and the coalition formation game is employed to tackle the best group partition problem. Moreover, a coalition formation algorithm is proposed to obtain a near-optimal solution with very low complexity. Simulation results show that our proposed algorithm can obtain a performance that is very near to the exhaustive search method, outperforms other existing schemes, and requires only a small computation overhead.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"539-549"},"PeriodicalIF":6.5,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140165520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Song Zhang;Lide Suo;Wenxin Li;Yuan Liu;Yulong Li;Keqiu Li
{"title":"Anole: Scheduling Flows for Fast Datacenter Networks With Packet Re-Prioritization","authors":"Song Zhang;Lide Suo;Wenxin Li;Yuan Liu;Yulong Li;Keqiu Li","doi":"10.1109/TCC.2024.3376716","DOIUrl":"10.1109/TCC.2024.3376716","url":null,"abstract":"Many existing datacenter transports perform one-shot packet priority tagging at end-hosts and leave them fixed during the packet's transmission. In this article, we experimentally show that: 1) such fixed packet priority is not sufficient for FCT (flow completion time) minimization, and 2) adjusting packet transmission priority in the network requires effective coordination among switches. Building on these insights, we present Anole, a new datacenter transport that advocates packet re-prioritization in near-bottleneck switches to minimize FCT. To this end, Anole integrates three simple-yet-effective techniques. First, it employs an in-network telemetry (INT) based approach to dynamically detect the bottleneck for each flow. Second, it adopts an on-off rate control mechanism for each sender to pause heavily congested flows but send lightly- and non-congested ones. Last, it leverages an altruistic scheduling policy at each switch to let the flows whose next hops are bottleneck switches give way to others. We implement an Anole prototype based on DPDK and show, through both testbed experiments and simulations, that Anole delivers significant performance advantages. For example, compared to EPN, Homa, and Aeolus, it shortens the average FCT of all (small) flows by up to 61.6% (89.1%).","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"550-562"},"PeriodicalIF":6.5,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140165983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dung H. P. Nguyen;Chih-Chieh Lin;Tu N. Nguyen;Shao-I Chu;Bing-Hong Liu
{"title":"Service Recovery in NFV-Enabled Networks: Algorithm Design and Analysis","authors":"Dung H. P. Nguyen;Chih-Chieh Lin;Tu N. Nguyen;Shao-I Chu;Bing-Hong Liu","doi":"10.1109/TCC.2024.3402185","DOIUrl":"10.1109/TCC.2024.3402185","url":null,"abstract":"Network function virtualization (NFV), a novel network architecture, promises to offer a lot of convenience in network design, deployment, and management. This paradigm, although flexible, suffers from many risks engendering interruption of services, such as node and link failures. Thus, resiliency is one of the requirements in NFV-enabled network design for recovering network services once occurring failures. Therefore, in addition to a primary chain of virtual network functions (VNFs) for a service, one typically allocates the corresponding backup VNFs to satisfy the resiliency requirement. Nevertheless, this approach consumes network resources that can be inherently employed to deploy more services. Moreover, one can hardly recover all interrupted services due to the limitation of network backup resources. In this context, the importance of the services is one of the factors employed to judge the recovery priority. In this article, we first assign each service a weight expressing its importance, then seek to retrieve interrupted services such that the total weight of the recovered services is maximum. Hence, we also call this issue the VNF restoration for recovering weighted services (VRRWS) problem. We next demonstrate the difficulty of the VRRWS problem is NP-hard and propose an effective technique, termed online recovery algorithm (ORA), to address the problem without necessitating the backup resources. Eventually, we conduct extensive simulations to evaluate the performance of the proposed algorithm as well as the factors affecting the recovery. The experiment shows that the available VNFs should be migrated to appropriate nodes during the recovery process to achieve better results.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"800-813"},"PeriodicalIF":6.5,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}