Xiao Wang;Hanchuan Xu;Jian Yang;Xiaofei Xu;Zhongjie Wang
{"title":"A Collaborative Service Composition Approach Considering Providers’ Self-Interest and Minimal Service Sharing","authors":"Xiao Wang;Hanchuan Xu;Jian Yang;Xiaofei Xu;Zhongjie Wang","doi":"10.1109/TPDS.2025.3534283","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3534283","url":null,"abstract":"Service composition dynamically integrates various services from multiple providers to meet complex user requirements. However, most existing methods assume centralized control over all services, which is often unrealistic because providers typically prefer to independently manage their own services, posing challenges to the application of traditional methods. Collaborative service composition offers a solution by enabling providers to work together to complete service composition. However, this approach also faces its own challenges. Driven by self-interest, providers may be reluctant to offer services needed by others, and due to business competition, they may wish to share as few services as possible (where sharing services means disclosing service information to other providers). To address these challenges, we propose a novel collaborative service composition approach that comprehensively considers each provider’s self-interest and achieves service composition with minimal service sharing. First, we introduce a “self-interest degree” model to capture providers’ self-interest. This behavior may lead to service refusal, so we design a service availability prediction method based on a reputation model to minimize rejections. Then, we propose a decentralized service composition method. It utilizes historical composition records to mine empirical rules between requirements and services, constructing a correlations matrix, and collaboratively trains a multi-label classification model with other providers under a distributed federated learning framework. Combining the matrix and model outputs, we design a service composition method and a node coordination protocol that completes service composition with minimal service sharing. Experimental results demonstrate the effectiveness of the proposed method in capturing providers’ self-interest and showcase its superior performance compared to existing methods.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 3","pages":"598-615"},"PeriodicalIF":5.6,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xue Jiang;Hengfeng Wei;Yu Huang;Yuxing Chen;Anqun Pan
{"title":"A Generic Specification Framework for Weakly Consistent Replicated Data Types","authors":"Xue Jiang;Hengfeng Wei;Yu Huang;Yuxing Chen;Anqun Pan","doi":"10.1109/TPDS.2025.3533546","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3533546","url":null,"abstract":"Burckhardt et al. proposed a formal specification framework for eventually consistent replicated data types, denoted <inline-formula><tex-math>$(vis, ar)$</tex-math></inline-formula>, based on the notions of visibility and arbitration relations. However, being specific to eventually consistent systems, this framework has two limitations. First, it does not cover non-convergent consistency models since arbitration <inline-formula><tex-math>$ar$</tex-math></inline-formula> is a total order over events. Second, it does not cover the consistency models in which each event is required to be aware of the return values of some events that are visible to it when justifying its return value. These limitations make the <inline-formula><tex-math>$(vis, ar)$</tex-math></inline-formula> framework not generic enough to specify and reason about important weak consistency models such as Causal Memory and PRAM. In this article, we extend this framework to a more generic one called <inline-formula><tex-math>$(vis, ar, V)$</tex-math></inline-formula> for weakly consistent replicated data types. To specify non-convergent consistency models as well, we relax the arbitration relation <inline-formula><tex-math>$ar$</tex-math></inline-formula> to be a partial order. To overcome the second limitation, we allow to specify for each event <inline-formula><tex-math>$e$</tex-math></inline-formula>, a subset <inline-formula><tex-math>$V(e)$</tex-math></inline-formula> of its visible set whose return values cannot be ignored when justifying the return value of <inline-formula><tex-math>$e$</tex-math></inline-formula>. To make it practically feasible, we provide candidates for the visibility and arbitration relations and the <inline-formula><tex-math>$V$</tex-math></inline-formula> function. By combining candidates for these three components, we are able to specify not only existing consistency models but also new ones that are reasonable and promising for practical usefulness. We then show how to specify consistency models in our framework, and provide three case studies.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1338-1353"},"PeriodicalIF":5.6,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoquan Zhang;Lin Cui;Fung Po Tso;Yuhui Deng;Zhetao Li;Weijia Jia
{"title":"Monte: SFCs Migration Scheme in the Distributed Programmable Data Plane","authors":"Xiaoquan Zhang;Lin Cui;Fung Po Tso;Yuhui Deng;Zhetao Li;Weijia Jia","doi":"10.1109/TPDS.2025.3532467","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3532467","url":null,"abstract":"Service function chains (SFCs) are sequences of network functions that provide specific services to meet operators’ needs in today's ISPs and datacenter networks. To improve the performance of SFCs, programmable data planes are used to leverage their low latency and high performance packet processing. However, SFCs need to be adaptable to dynamics such as changes in requirements and attributes. Therefore, the ability to migrate SFCs is essential. Unfortunately, migrating SFCs in distributed programmable data planes is challenging due to the risk of degraded performance and failure to meet SFCs requirements and resource constraints in switches. In this paper, we propose <italic>Monte</i>, which provides an effective SFCs migration scheme in distributed programmable data planes. We build a novel integer programming model to represent the migration process with constraints on resource limitations of switches and SFCs attributes in the distributed data plane. Additionally, an SFCs migration algorithm is designed to optimize the migration cost by deeply analyzing resource allocation in the switch pipeline. <italic>Monte</i> has been implemented on both P4 software switches (Bmv2) and hardware switches (Intel Tofino ASIC). Extensive evaluation results show that the migration cost in <italic>Monte</i> is 94.03% lower on average than the state-of-the-art deployment scheme, and <italic>Monte</i> can effectively save pipeline resources.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 4","pages":"633-644"},"PeriodicalIF":5.6,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143535517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collaborative Edge-Cloud Data Transfer Optimization for Industrial Internet of Things","authors":"Xinchang Zhang;Maoli Wang;Xiaomin Zhu;Zhiwei Yan;Guanggang Geng","doi":"10.1109/TPDS.2025.3532261","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3532261","url":null,"abstract":"In the Industrial Internet of Things, it is necessary to reserve enough bandwidth resources according to the maximum traffic peak. However, bandwidth reservation based on the maximum traffic peak leads to low resource utilization. In this paper, we propose a data transfer optimization solution, based on the cooperation of different entities in the local area, which strives to deliver data acquired by sensors to the cloud in a reliable manner and improve bandwidth utilization to save limited network resources. In our solution, the data transfers from the sensors in a local network are controlled by a local controller and some edge gateways with acceptable cost such that no congestion occurs in the path to the cloud and the bandwidth requirement of each flow can be met. To obtain a tradeoff between resource utilization and transfer delay, we study the problem of minimizing the maximum rate peak of periodic real-time traffic from distributed sensors and propose an algorithm to solve this problem with a desirable lower boundary of the performance. In addition, we design an application-level forwarding method that significantly improves resource utilization and a method of implementing reliable sampling instant adjustment. The experimental results show that our solution significantly improves resource utilization without producing network congestion.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 3","pages":"580-597"},"PeriodicalIF":5.6,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Survey on Characterizing and Understanding GNNs From a Computer Architecture Perspective","authors":"Meng Wu;Mingyu Yan;Wenming Li;Xiaochun Ye;Dongrui Fan;Yuan Xie","doi":"10.1109/TPDS.2025.3532089","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3532089","url":null,"abstract":"Characterizing and understanding graph neural networks (GNNs) is essential for identifying performance bottlenecks and facilitating their deployment in parallel and distributed systems. Despite substantial work in this area, a comprehensive survey on characterizing and understanding GNNs from a computer architecture perspective is lacking. This article presents a comprehensive survey, proposing a triple-level classification method to categorize, summarize, and compare existing efforts, particularly focusing on their implications for parallel architectures and distributed systems. We identify promising future directions for GNN characterization that align with the challenges of optimizing hardware and software in parallel and distributed systems. Our survey aims to help scholars systematically understand GNN performance bottlenecks and execution patterns from a computer architecture perspective, thereby contributing to the development of more efficient GNN implementations across diverse parallel architectures and distributed systems.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 3","pages":"537-552"},"PeriodicalIF":5.6,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Note on “AESM2 Attribute-Based Encrypted Search for Multi-Owner and Multi-User Distributed Systems”","authors":"Zhengjun Cao","doi":"10.1109/TPDS.2025.3531446","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3531446","url":null,"abstract":"We show that the attribute-based encrypted search protocol [IEEE TPDS, 2023, 34(1), 92–107] is insecure against unauthorized user querying attack, because an adversary can convert a valid query from any authorized user into a new legitimate query, while the server cannot detect the fraud.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 4","pages":"675-676"},"PeriodicalIF":5.6,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Response Time Analysis and Optimal Priority Assignment for Global Non-Preemptive Fixed-Priority Rigid Gang Scheduling","authors":"Binqi Sun;Tomasz Kloda;Jiyang Chen;Cen Lu;Marco Caccamo","doi":"10.1109/TPDS.2025.3529218","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3529218","url":null,"abstract":"Non-preemptive rigid gang scheduling combines the efficiency of parallel execution with the reduced overhead of non-preemptive scheduling. This approach is particularly advantageous for parallel hardware accelerators, such as Google's Edge Tensor Processing Unit (TPU), which is widely used for deep neural network (DNN) inference on embedded systems. This paper studies sporadic global non-preemptive fixed-priority (NP-FP) rigid gang scheduling, which is well-suited for DNN applications in Edge TPU pipelines. Each gang task spawns a fixed number of threads that must execute concurrently across distinct processing units. We introduce the first carry-in limitation technique specifically designed for gang task response time analysis, addressing the unique challenges posed by intra-task parallelism. This technique is formulated as a generalized knapsack problem, and we develop both a linear programming relaxation and a dynamic programming approach to solve it under different time complexities. Additionally, we propose the first optimal priority assignment policy for NP-FP gang schedulability tests. Our proposed schedulability analysis and optimal priority assignment policy are evaluated through extensive experiments, including both synthetic task sets and a case study using DNN benchmarks on commercial off-the-shelf Edge TPU accelerators. The results demonstrate that the proposed approaches effectively enhance the state-of-the-art global NP-FP gang schedulability tests, achieving improvements of up to 57.9% for synthetic task sets and 76.7% for Edge TPU benchmarks. Furthermore, we conduct an ablations study to examine the impact of different algorithmic components in the proposed technique, providing valuable insights for future research.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 3","pages":"455-470"},"PeriodicalIF":5.6,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840299","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparative Study of Sampling Methods With Cross-Validation in the FedHome Framework","authors":"Arash Ahmadi;Sarah S. Sharif;Yaser M. Banad","doi":"10.1109/TPDS.2025.3526238","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3526238","url":null,"abstract":"This article presents a comparative study of sampling methods within the FedHome framework, designed for personalized in-home health monitoring. FedHome leverages federated learning (FL) and generative convolutional autoencoders (GCAE) to train models on decentralized edge devices while prioritizing data privacy. A notable challenge in this domain is the class imbalance in health data, where critical events such as falls are underrepresented, adversely affecting model performance. To address this, the research evaluates six oversampling techniques using Stratified K-fold cross-validation: SMOTE, Borderline-SMOTE, Random OverSampler, SMOTE-Tomek, SVM-SMOTE, and SMOTE-ENN. These methods are tested on FedHome's public implementation over 200 training rounds with and without stratified K-fold cross-validation. The findings indicate that SMOTE-ENN achieves the most consistent test accuracy, with a standard deviation range of 0.0167–0.0176, demonstrating stable performance compared to other samplers. In contrast, SMOTE and SVM-SMOTE exhibit higher variability in performance, as reflected by their wider standard deviation ranges of 0.0157–0.0180 and 0.0155–0.0180, respectively. Similarly, the Random OverSampler method shows a significant deviation range of 0.0155–0.0176. SMOTE-Tomek, with a deviation range of 0.0160–0.0175, also shows greater stability but not as much as SMOTE-ENN. This finding highlights the potential of SMOTE-ENN to enhance the reliability and accuracy of personalized health monitoring systems within the FedHome framework.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 3","pages":"570-579"},"PeriodicalIF":5.6,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"$ mathsf{GPABE} $GPABE: GPU-Based Parallelization Framework for Attribute-Based Encryption Schemes","authors":"Wenhan Xu;Hui Ma;Rui Zhang;Jianhao Li","doi":"10.1109/TPDS.2025.3529776","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3529776","url":null,"abstract":"Attribute-based encryption (ABE) has emerged as a new paradigm for access control in cloud computing. However, despite the many promising features of ABE, its deployment in real-world systems is still limited, partially due to the expensive cost of its underlying mathematical operations, which often grow linearly with the size and complexity of the system's security policies. This becomes particularly challenging in data-intensive applications, where multiple users may simultaneously access and manipulate large volumes of data, resulting in high levels of concurrency and demand for computing resources, which are too heavy even for high-end servers. Further exacerbating the issues are the functionality and security requirements of a cloud, as they introduce additional computations to both the client and the server. Therefore, in this work, we introduce <inline-formula><tex-math>$ mathsf{GPABE} $</tex-math></inline-formula>, the first GPU-based parallelization framework for ABE to facilitate its batch processing in cloud computing. By analyzing ABE's major computational workload, we identify multiple arithmetic modules that are common in the design of pairing-based ABEs. Based on the analysis, we further propose to decompose the ABE algorithm into computation graph, which can be efficiently implemented on the GPU platform. Our graph representation bridges the gap between ABE's high-level design and their low-level implementation on GPUs, and is applicable to a variety of popular schemes in the realm of ABE. We then implement <inline-formula><tex-math>$ mathsf{GPABE} $</tex-math></inline-formula> as a heterogeneous computing server, with several optimization techniques to improve its throughput. Finally, we evaluate the GPU implementation of several ABE schemes using <inline-formula><tex-math>$ mathsf{GPABE} $</tex-math></inline-formula>. The results show a speedup of least 51.0× and at most 253.6× for the throughput of ABE algorithms, compared to their state-of-the-art CPU implementations, which preliminarily demonstrated the effectiveness of <inline-formula><tex-math>$ mathsf{GPABE} $</tex-math></inline-formula>.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 3","pages":"520-536"},"PeriodicalIF":5.6,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lizhen Zhou;Zichuan Xu;Qiufen Xia;Zhou Xu;Wenhao Ren;Wenbo Qi;Jinjing Ma;Song Yan;Yuan Yang
{"title":"Chasing Common Knowledge: Joint Large Model Selection and Pulling in MEC With Parameter Sharing","authors":"Lizhen Zhou;Zichuan Xu;Qiufen Xia;Zhou Xu;Wenhao Ren;Wenbo Qi;Jinjing Ma;Song Yan;Yuan Yang","doi":"10.1109/TPDS.2025.3527649","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3527649","url":null,"abstract":"Pretrained Foundation Models (PFMs) are regarded as a promising accelerator for the development of various Artificial Intelligence (AI) applications, and have recently been widely fine-tuned to satisfy users’ personalized inference demands. As many users are attracted to PFM-based AI applications, remote data centers are increasingly unable to solely bear the enormous computational demands and meet the delay requirements of inference requests. Mobile edge computing (MEC) offers a viable solution for delivering low-latency inference services by pulling fine-tuned PFMs from the remote data center to cloudlets in the proximity of users. However, a fine-tuned PFM typically comprises billions of model parameters, which are highly resource-intensive, time-consuming, and cost-prohibitive to execute at the edge. To address this, we investigate a novel joint large model selection and pulling problem in MEC networks. The novelty of our study lies in exploring parameter sharing among fine-tuned PFMs based on their common knowledge. Specifically, we first formulate a Non-Linear Integer Programming (NLIP) for the problem to minimize the total delay of implementing all inference requests. We then transform the NLIP into an equivalent Integer Linear Program (ILP) that is much simpler to solve. We further propose a randomized algorithm with a provable approximation ratio for the problem. We also consider the online version of the problem with uncertain request demand, and develop an online learning algorithm with a bounded regret. The crux of the online algorithm is the adoption of the multi-armed bandit technique with restricted context for dynamic admissions of inference requests. We finally conduct extensive experiments based on real datasets. Experimental results demonstrate that our algorithms reduce at least 38% in total delays and average costs, while achieving a 5% improvement in average accuracies.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 3","pages":"437-454"},"PeriodicalIF":5.6,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}