{"title":"Locality-Aware and Fault-Tolerant Batching for Machine Learning on Distributed Datasets","authors":"Liu Liu;Zhijun Ding;Dazhao Cheng;Xiaobo Zhou","doi":"10.1109/TCC.2024.3351716","DOIUrl":"10.1109/TCC.2024.3351716","url":null,"abstract":"The performance of distributed ML training is largely determined by workers that generate gradients in the slowest pace, i.e., stragglers. The state-of-the-art load balancing approaches consider that each worker stores a complete dataset locally and the data fetching time can be ignored. They only consider the computation capacity of workers in equalizing the gradient computation time. However, we find that in scenarios of ML on distributed datasets, whether in edge computing or distributed data cache systems, the data fetching time is non-negligible and often becomes the primary cause of stragglers. In this paper, we present LOFT, an adaptive load balancing approach for ML upon distributed datasets at the edge. It aims to balance the time to generate gradients at each worker while ensuring the model accuracy. Specifically, LOFT features a locality-aware batching. It builds performance and optimization models upon data fetching and gradient computation time. Leveraging the models, it develops an adaptive scheme based on grid search. Furthermore, it offers Byzantine gradient aggregation upon Ring All-Reduce, which makes itself fault-tolerant under Byzantine gradients brought by a small batch size. Experiments with twelve public DNN models and four open datasets show that LOFT reduces the training time by up to 46%, while reducing the training loss by up to 67% compared to LB-BSP.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"370-387"},"PeriodicalIF":6.5,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139951172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BatOpt: Optimizing GPU-Based Deep Learning Inference Using Dynamic Batch Processing","authors":"Deyu Zhang;Yunzhen Luo;Yaobo Wang;Xiaoyan Kui;Ju Ren","doi":"10.1109/TCC.2024.3350561","DOIUrl":"10.1109/TCC.2024.3350561","url":null,"abstract":"Deep learning (DL) has been applied in billions of mobile devices due to its astonishing performance in image, text, and audio processing. However, limited by the computing capability of mobile devices, a large amount of DL inference tasks need to be offloaded to edge or cloud servers, which makes powerful GPU servers are struggling to ensure the quality of service(QoS). To better utilize the highly parallel computing architecture of GPU to improve the QoS, we propose BatOpt, a framework that uses dynamic batch processing to strike a good balance between service latency and GPU memory usage in DL inference services. Specifically, BatOpt innovatively models the DL inference service as a \u0000<inline-formula><tex-math>$M/G(a,b)/1/N$</tex-math></inline-formula>\u0000 queue, with the consideration of stochastic task arrivals, which enables it to predict the service latency accurately in different system states. Furthermore, we propose an optimization algorithm to trade off the service latency and GPU memory usage in different system states by analyzing the queueing model. We have implemented BatOpt on Pytorch and evaluated it on an RTX 2080 GPU using real DL models. BatOpt brings up to 31x and 4.3x times performance boost in terms of service latency, compared to single-input and fixed-batch-size strategies, respectively. And BatOpt's maximum GPU memory usage is only 0.3x that of greedy-dynamic-batch-size strategy on the premise of the same service latency.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"174-185"},"PeriodicalIF":6.5,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139951519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"psvCNN: A Zero-Knowledge CNN Prediction Integrity Verification Strategy","authors":"Yongkai Fan;Binyuan Xu;Linlin Zhang;Gang Tan;Shui Yu;Kuan-Ching Li;Albert Zomaya","doi":"10.1109/TCC.2024.3350233","DOIUrl":"10.1109/TCC.2024.3350233","url":null,"abstract":"Model prediction based on machine learning is provided as a service in cloud environments, but how to verify that the model prediction service is entirely conducted becomes a critical challenge. Although zero-knowledge proof techniques potentially solve the integrity verification problem, when applied to the prediction integrity of massive privacy-preserving Convolutional Neural Networks (CNNs), the significant proof burden results in low practicality. In this research, we present psvCNN (parallel splitting zero-knowledge technique for integrity verification). The psvCNN scheme effectively improves the utilization of computational resources in CNN prediction integrity proving by an independent splitting design. Through a convolutional kernel-based model splitting design and an underlying zero-knowledge succinct non-interactive knowledge argument, our psvCNN develops parallelizable zero-knowledge proof circuits for CNN prediction. Furthermore, psvCNN presents an updated Freivalds algorithm for a faster integrity verification process. In terms of proof time and storage, experiments show that psvCNN is practical and efficient. psvCNN generates a prediction integrity proof with a proof size of 1.2MB in 7.65s for the structurally complicated CNN model VGG16. psvCNN is 3765 times quicker than the latest zk-SNARK-based non-interactive method vCNN and 12 times faster than the latest sumcheck-based interactive technique zkCNN in terms of proving time.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"359-369"},"PeriodicalIF":6.5,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139951405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VSA-SD: A Service Discovery Method Based on Vector Symbol Architecture for Low-Cost IoT System Development","authors":"Haiming Chen;Lei Wang;Wei Qin;Xinyan Zhou;Li Cui","doi":"10.1109/TCC.2023.3344512","DOIUrl":"https://doi.org/10.1109/TCC.2023.3344512","url":null,"abstract":"In recent years, with the widening applications of the Internet of Things (IoT), more and more perception services (e.g., air quality indicator services, road traffic congestion monitoring services, etc) with different arguments (e.g., data type, source location, creator, etc) will be deployed by dedicated IT infrastructure service providers for constructing customized IoT systems with low cost by subscription. So it is an indispensable step to check whether the required perception services with specified arguments have been available for the constructing IoT through discovery method to reduce the redundancy of service deployment. However, it is a challenging problem to design efficient (i.e., achieving high accuracy and low response delay with low overhead), highly robust, and trustworthy mechanisms for discovering perception services on resource-constrained IoT devices. To solve this problem, we proposed a distributed service discovery method, named VSA-SD, based on the Vector Symbolic Architecture (VSA). This method employs hyperdimensional vectors to describe services in a distributed manner, and measures the degree of service matching by calculating the Hamming distance, thereby achieving service discovery. We implemented VSA-SD in NBUFlow, which is an IoT task construction and offloading test platform, and evaluated its performance through comprehensive experiments. Results show that VSA-SD outperforms the centralized, hybrid, and other distributed service discovery mechanisms in terms of accuracy, response delay, overhead, robustness, trustability, interoperability, and mobility.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"145-158"},"PeriodicalIF":6.5,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Mode Instance-Intensive Workflow Task Batch Scheduling in Containerized Hybrid Cloud","authors":"An Liu;Ming Gao;Jiafu Tang","doi":"10.1109/TCC.2023.3344194","DOIUrl":"https://doi.org/10.1109/TCC.2023.3344194","url":null,"abstract":"The migration of containerized microservices from virtual machines (VMs) to cloud data centers has become the most advanced deployment technique for large software applications in the cloud. This study investigates the scheduling of instance-intensive workflow (IWF) tasks to be executed in containers on a hybrid cloud when computational resources are limited. The process of scheduling these IWF tasks becomes complicated when considering the deployment time of containers, inter-task communication time, and their dependencies simultaneously, particularly when the task can choose multi-mode executions due to the flexible computational resource allocation of the container. We propose a batch scheduling strategy (BSS) for the IWF task scheduling problem. The BSS prioritizes the execution of IWF tasks with high repetition rates with a certain probability and records the virtual machines and modes selected for task execution, which can reduce the data transfer time and the randomness of computation. Based on this, we use an improved hybrid algorithm combined with BSS to solve the multi-mode IWF task scheduling problem. The experimental results demonstrate that employing the BSS can reduce the scheduling time by 6% when the number of workflows increases to 80. Additionally, we tested the effectiveness of all operators in the algorithm, and the results show that each step of the algorithm yields good performance. Compared to similar algorithms in related studies, the overall algorithm can achieve a maximum reduction of approximately 18% in the target value.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"159-173"},"PeriodicalIF":6.5,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pay-Per-Proof: Decentralized Outsourced Multi-User PoR for Cloud Storage Payment Using Blockchain","authors":"Hui Cui;Zhiguo Wan;Tianyu Zhaolu;Huaqun Wang;Atsuko Miyaji","doi":"10.1109/TCC.2023.3343710","DOIUrl":"https://doi.org/10.1109/TCC.2023.3343710","url":null,"abstract":"Cloud computing has been widely applied in data storage, but cloud computing is not armed with an efficient integrity check mechanism for users to learn whether their large volumes of data have been kept intact by the cloud. The concept of proofs of retrievability (PoR) was introduced to address such an issue by enabling users to check the integrity of their data stored by the cloud. But PoR requires users to regularly send queries to the cloud, and its integrity check method cannot be extended to share the verification responsibility in the multi-user setting where different users store the same data to the cloud. With such concerns in mind, we put forth a notion called outsourced multi-user proofs of retrievability (\u0000<inline-formula><tex-math>$mathtt {OMTPoR}$</tex-math></inline-formula>\u0000) which allows users with the same data stored by the cloud to share the information for the integrity check, and a third party is required to regularly check data integrity on behalf of users using the shared information. We give a concrete construction of \u0000<inline-formula><tex-math>$mathtt {OMTPoR}$</tex-math></inline-formula>\u0000 based on the homomorphic property of an existing property and analyze its security. To enforce honest integrity checks, we build the concrete \u0000<inline-formula><tex-math>$mathtt {OMTPoR}$</tex-math></inline-formula>\u0000 construction over the blockchain using smart contracts to guarantee the honesty of participants, yielding a decentralized outsourced multi-user PoR solution that utilizes the blockchain miners as the third parties. Furthermore, our solution enables the cloud server to obtain payment for the storage service if the PoR is verified by the miners. We fully implement the \u0000<inline-formula><tex-math>$mathtt {OMTPoR}$</tex-math></inline-formula>\u0000 scheme over the blockchain to evaluate its performance, which demonstrates obvious superiority over traditional PoR schemes without the detection of data duplication.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"130-144"},"PeriodicalIF":6.5,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Approximation Algorithms for Scheduling Coflows With Total Weighted Completion Time in Identical Parallel Networks","authors":"Chi-Yeh Chen","doi":"10.1109/TCC.2023.3340729","DOIUrl":"https://doi.org/10.1109/TCC.2023.3340729","url":null,"abstract":"This article addresses the scheduling problem of coflows in identical parallel networks, a well-known \u0000<inline-formula><tex-math>$mathcal {NP}$</tex-math></inline-formula>\u0000-hard problem. We consider both flow-level scheduling and coflow-level scheduling problems. In the flow-level scheduling problem, flows within a coflow can be transmitted through different network cores, while in the coflow-level scheduling problem, flows within a coflow must be transmitted through the same network core. The key difference between these two problems lies in their scheduling granularity. Previous approaches relied on linear programming to solve the scheduling order. In this article, we enhance the efficiency of solving by utilizing the primal-dual method. For the flow-level scheduling problem, we propose an approximation algorithm that achieves approximation ratios of \u0000<inline-formula><tex-math>$6-frac{2}{m}$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$5-frac{2}{m}$</tex-math></inline-formula>\u0000 for arbitrary and zero release times, respectively, where \u0000<inline-formula><tex-math>$m$</tex-math></inline-formula>\u0000 represents the number of network cores. Additionally, for the coflow-level scheduling problem, we introduce an approximation algorithm that achieves approximation ratios of \u0000<inline-formula><tex-math>$4m+1$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$text{4}m$</tex-math></inline-formula>\u0000 for arbitrary and zero release times, respectively. The algorithm presented in this article has practical applications in data centers, such as those operated by Google or Facebook. The simulated results demonstrate the superior performance of our algorithms compared to previous approach, emphasizing their practical utility.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"116-129"},"PeriodicalIF":6.5,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10349921","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Grisha Weintraub;Ehud Gudes;Shlomi Dolev;Jeffrey D. Ullman
{"title":"Optimizing Cloud Data Lake Queries With a Balanced Coverage Plan","authors":"Grisha Weintraub;Ehud Gudes;Shlomi Dolev;Jeffrey D. Ullman","doi":"10.1109/TCC.2023.3339208","DOIUrl":"https://doi.org/10.1109/TCC.2023.3339208","url":null,"abstract":"Cloud data lakes emerge as an inexpensive solution for storing very large amounts of data. The main idea is the separation of compute and storage layers. Thus, cheap cloud storage is used for storing the data, while compute engines are used for running analytics on this data in “on-demand” mode. However, to perform any computation on the data in this architecture, the data should be moved from the storage layer to the compute layer over the network for each calculation. Obviously, that hurts calculation performance and requires huge network bandwidth. In this paper, we study different approaches to improve query performance in a data lake architecture. We define an optimization problem that can provably speed up data lake queries. We prove that the problem is NP-hard and suggest heuristic approaches. Then, we demonstrate through the experiments that our approach is feasible and efficient (up to ×30 query execution time improvement based on the TPC-H benchmark).","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"84-99"},"PeriodicalIF":6.5,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrated Computation Offloading, UAV Trajectory Control, Edge-Cloud and Radio Resource Allocation in SAGIN","authors":"Minh Dat Nguyen;Long Bao Le;André Girard","doi":"10.1109/TCC.2023.3339394","DOIUrl":"https://doi.org/10.1109/TCC.2023.3339394","url":null,"abstract":"In this article, we study the computation offloading problem in hybrid edge-cloud based space-air-ground integrated networks (SAGIN), where joint optimization of partial computation offloading, unmanned aerial vehicle (UAV) trajectory control, user scheduling, edge-cloud computation, radio resource allocation, and admission control is performed. Specifically, the considered SAGIN employs multiple UAV-mounted edge servers with controllable UAV trajectory and a cloud sever which can be reached by ground users (GUs) via multi-hop low-earth-orbit (LEO) satellite communications. This design aims to minimize the weighted energy consumption of the GUs and UAVs while satisfying the maximum delay constraints of underlying computation tasks. To tackle the underlying non-convex mixed integer non-linear optimization problem, we use the alternating optimization approach where we iteratively solve four sub-problems, namely user scheduling, partial offloading control and bit allocation over time slots, computation resource and bandwidth allocation, and multi-UAV trajectory control until convergence. Moreover, feasibility verification and admission control strategies are proposed to handle overloaded network scenarios. Furthermore, the successive convex approximation (SCA) method is employed to convexify and solve the non-convex computation resource and bandwidth allocation and UAV trajectory control sub-problems. Via extensive numerical studies, we illustrate the effectiveness of our proposed design compared to baselines.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"100-115"},"PeriodicalIF":6.5,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Publicly Verifiable Outsourcing Matrix Computation Scheme Based on Smart Contracts","authors":"Hao Wang;Chunpeng Ge;Lu Zhou;Zhe Liu;Dongwan Lan;Xiaozhen Lu;Danni Jiang","doi":"10.1109/TCC.2023.3337848","DOIUrl":"https://doi.org/10.1109/TCC.2023.3337848","url":null,"abstract":"Matrix computation is a crucial mathematical tool in scientific fields such as Artificial Intelligence and Cryptographic computation. However, it is difficult for resource-limited devices to execute large-scale matrix computations independently. Outsourcing matrix computation (OMC) is a promising solution that engages a cloud server to process complicated matrix computations for resource-limited devices. However, existing OMC schemes lack public verifiability, and thus resource-limited devices cannot verdict the correctness of the computing results. In this paper, for the first time, we propose a smart contract-based OMC scheme that publicly verifies the outsourcing matrix computation results. In our scheme, a smart contract running over the blockchain serves as a decentralized trusted third party to ensure the correctness of the matrix computation results. To overcome the Verifier's Dilemma in the blockchain, we present a blockchain-compatible matrix verification method that decreases the time complexity from \u0000<inline-formula><tex-math>$O(n^{3})$</tex-math></inline-formula>\u0000 to \u0000<inline-formula><tex-math>$O(n^{2})$</tex-math></inline-formula>\u0000 by utilizing a blinding method with the check digit and padding matrices. We make the verification become the form of comparing whether two results are identical rather than naive re-computing. Finally, we perform experiments on Ethereum and ARM Cortex-M4 and give in-depth analysis and performance evaluation, demonstrating our scheme's practicability and effectiveness.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"70-83"},"PeriodicalIF":6.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}