IEEE Transactions on Parallel and Distributed Systems最新文献

筛选
英文 中文
Parallel Multi Objective Shortest Path Update Algorithm in Large Dynamic Networks 大型动态网络中的并行多目标最短路径更新算法
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-01-30 DOI: 10.1109/TPDS.2025.3536357
S. M. Shovan;Arindam Khanda;Sajal K. Das
{"title":"Parallel Multi Objective Shortest Path Update Algorithm in Large Dynamic Networks","authors":"S. M. Shovan;Arindam Khanda;Sajal K. Das","doi":"10.1109/TPDS.2025.3536357","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3536357","url":null,"abstract":"The multi objective shortest path (MOSP) problem, crucial in various practical domains, seeks paths that optimize multiple objectives. Due to its high computational complexity, numerous parallel heuristics have been developed for static networks. However, real-world networks are often dynamic where the network topology changes with time. Efficiently updating the shortest path in such networks is challenging, and existing algorithms for static graphs are inadequate for these dynamic conditions, necessitating novel approaches. Here, we first develop a parallel algorithm to efficiently update a single objective shortest path (SOSP) in fully dynamic networks, capable of accommodating both edge insertions and deletions. Building on this, we propose <italic><b>DynaMOSP</b></i>, a parallel heuristic for <bold>Dyna</b>mic <bold>M</b>ulti <bold>O</b>bjective <bold>S</b>hortest <bold>P</b>ath searches in large, fully dynamic networks. We provide a theoretical analysis of the conditions to achieve Pareto optimality. Furthermore, we devise a dedicated shared memory CPU implementation along with a version for heterogeneous computing environments. Empirical analysis on eight real-world graphs demonstrates that our method scales effectively. The shared memory CPU implementation achieves an average speedup of 12.74× and a maximum of 57.22×, while on an Nvidia GPU, it attains an average speedup of 69.19×, reaching up to 105.39× when compared to state-of-the-art techniques.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 5","pages":"932-944"},"PeriodicalIF":5.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143808947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Loci: Federated Continual Learning of Heterogeneous Tasks at Edge 位点:边缘异构任务的联合持续学习
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-01-29 DOI: 10.1109/TPDS.2025.3531123
Yaxin Luopan;Rui Han;Qinglong Zhang;Xiaojiang Zuo;Chi Harold Liu;Guoren Wang;Lydia Y. Chen
{"title":"Loci: Federated Continual Learning of Heterogeneous Tasks at Edge","authors":"Yaxin Luopan;Rui Han;Qinglong Zhang;Xiaojiang Zuo;Chi Harold Liu;Guoren Wang;Lydia Y. Chen","doi":"10.1109/TPDS.2025.3531123","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3531123","url":null,"abstract":"Federated continual learning (FCL) has attracted growing attention in achieving collaborative model training among edge clients, each of which learns its local model for a sequence of tasks. Most existing FCL approaches aggregate clients’ latest local models to exchange knowledge. This unfortunately deviates from real-world scenarios where each model is optimized independently using the client’s own dynamic data and different clients have heterogeneous tasks. These tasks not only have distinct class labels (e.g., animals or vehicles) but also differ in input feature distributions. The aggregated model thus often shifts to a higher loss value and incurs accuracy degradation. In this article, we depart from the model-grained view of aggregation and transform it into multiple task-grained aggregations. Each aggregation allows a client to learn from other clients to improve its model accuracy on one task. To this end, we propose Loci to provide abstractions for clients’ past and peer task knowledge using compact model weights, and develop a communication-efficient approach to train each client’s local model by exchanging its tasks’ knowledge with the most accuracy relevant one from other clients. Through its general-purpose API, Loci can be used to provide efficient on-device training for existing deep learning applications of graph, image, nature language processing, and multimodal data. Using extensive comparative evaluations, we show Loci improves the model accuracy by 32.48% without increasing training time, reduces communication cost by 83.6%, and achieves more improvements when scale (task/client number) increases.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 4","pages":"775-790"},"PeriodicalIF":5.6,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FHE4DMM: A Low-Latency Distributed Matrix Multiplication With Fully Homomorphic Encryption FHE4DMM:具有完全同态加密功能的低延迟分布式矩阵乘法器
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-01-28 DOI: 10.1109/TPDS.2025.3534846
Yi Chen;Qiang-Sheng Hua;Zixiao Hong;Lin Zhu;Hai Jin
{"title":"FHE4DMM: A Low-Latency Distributed Matrix Multiplication With Fully Homomorphic Encryption","authors":"Yi Chen;Qiang-Sheng Hua;Zixiao Hong;Lin Zhu;Hai Jin","doi":"10.1109/TPDS.2025.3534846","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3534846","url":null,"abstract":"Fully Homomorphic Encryption (FHE) is a promising technology for secure, non-interactive outsourced computation. One notable method to increase the throughput of FHE-based outsourcing is batching, which typically involves large-scale matrix-matrix multiplications (MM). However, the substantial overhead inherent in existing FHE schemes poses a major challenge for processing these large-scale tasks, often resulting in insufficient memory or prolonged delays on a single machine, making it practically unviable. Utilizing multi-machine parallelism in cloud clusters for outsourced computation offers a natural solution to these obstacles. In this work, we propose FHE4DMM, a distributed algorithm that provides a unified view on encrypted matrices, accommodating various FHE schemes and any matrix dimensions, to accelerate large-scale encrypted MM. A key innovation is its reuse optimizations for parallelized homomorphic computations, which can offer valuable insights for broader FHE-based applications. We utilized FHE4DMM to conduct large-scale square (<inline-formula><tex-math>$4096times 4096$</tex-math></inline-formula>) and rectangular (<inline-formula><tex-math>$32768times 32768,32768times 16$</tex-math></inline-formula> ) matrix multiplications on 256 machines, achieving computation time of 172.2 s and 76.1 s, respectively, while ensuring a 128-bit security level. For scalability, the experiments demonstrate that FHE4DMM achieves linear speedup for <inline-formula><tex-math>$2^{i}$</tex-math></inline-formula> (<inline-formula><tex-math>$i$</tex-math></inline-formula> is from 0 to 6) machines across various matrix dimension cases. In addition, within the range of matrix dimensions that the state-of-the-art (SOTA) distributed FHE-MM algorithm (Huang et al. 2023) can handle, FHE4DMM attains a maximum speedup of 16.62x. To assess its practical performance, FHE4DMM is applied in a basic multi-layer feedforward network. We used 64 machines to perform secure outsourced inference on MNIST and CIFAR-10 datasets with encrypted models and data. Compared to using the SOTA, our method achieved speedups of up to 3.54x and 4.22x respectively, with the MM module obtaining a 4.09x and 4.87x speedup.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 4","pages":"645-658"},"PeriodicalIF":5.6,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10856418","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Collaborative Service Composition Approach Considering Providers’ Self-Interest and Minimal Service Sharing 考虑提供者自利和最小服务共享的协同服务组合方法
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-01-27 DOI: 10.1109/TPDS.2025.3534283
Xiao Wang;Hanchuan Xu;Jian Yang;Xiaofei Xu;Zhongjie Wang
{"title":"A Collaborative Service Composition Approach Considering Providers’ Self-Interest and Minimal Service Sharing","authors":"Xiao Wang;Hanchuan Xu;Jian Yang;Xiaofei Xu;Zhongjie Wang","doi":"10.1109/TPDS.2025.3534283","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3534283","url":null,"abstract":"Service composition dynamically integrates various services from multiple providers to meet complex user requirements. However, most existing methods assume centralized control over all services, which is often unrealistic because providers typically prefer to independently manage their own services, posing challenges to the application of traditional methods. Collaborative service composition offers a solution by enabling providers to work together to complete service composition. However, this approach also faces its own challenges. Driven by self-interest, providers may be reluctant to offer services needed by others, and due to business competition, they may wish to share as few services as possible (where sharing services means disclosing service information to other providers). To address these challenges, we propose a novel collaborative service composition approach that comprehensively considers each provider’s self-interest and achieves service composition with minimal service sharing. First, we introduce a “self-interest degree” model to capture providers’ self-interest. This behavior may lead to service refusal, so we design a service availability prediction method based on a reputation model to minimize rejections. Then, we propose a decentralized service composition method. It utilizes historical composition records to mine empirical rules between requirements and services, constructing a correlations matrix, and collaboratively trains a multi-label classification model with other providers under a distributed federated learning framework. Combining the matrix and model outputs, we design a service composition method and a node coordination protocol that completes service composition with minimal service sharing. Experimental results demonstrate the effectiveness of the proposed method in capturing providers’ self-interest and showcase its superior performance compared to existing methods.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 3","pages":"598-615"},"PeriodicalIF":5.6,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Generic Specification Framework for Weakly Consistent Replicated Data Types 弱一致复制数据类型的通用规范框架
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-01-24 DOI: 10.1109/TPDS.2025.3533546
Xue Jiang;Hengfeng Wei;Yu Huang;Yuxing Chen;Anqun Pan
{"title":"A Generic Specification Framework for Weakly Consistent Replicated Data Types","authors":"Xue Jiang;Hengfeng Wei;Yu Huang;Yuxing Chen;Anqun Pan","doi":"10.1109/TPDS.2025.3533546","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3533546","url":null,"abstract":"Burckhardt et al. proposed a formal specification framework for eventually consistent replicated data types, denoted <inline-formula><tex-math>$(vis, ar)$</tex-math></inline-formula>, based on the notions of visibility and arbitration relations. However, being specific to eventually consistent systems, this framework has two limitations. First, it does not cover non-convergent consistency models since arbitration <inline-formula><tex-math>$ar$</tex-math></inline-formula> is a total order over events. Second, it does not cover the consistency models in which each event is required to be aware of the return values of some events that are visible to it when justifying its return value. These limitations make the <inline-formula><tex-math>$(vis, ar)$</tex-math></inline-formula> framework not generic enough to specify and reason about important weak consistency models such as Causal Memory and PRAM. In this article, we extend this framework to a more generic one called <inline-formula><tex-math>$(vis, ar, V)$</tex-math></inline-formula> for weakly consistent replicated data types. To specify non-convergent consistency models as well, we relax the arbitration relation <inline-formula><tex-math>$ar$</tex-math></inline-formula> to be a partial order. To overcome the second limitation, we allow to specify for each event <inline-formula><tex-math>$e$</tex-math></inline-formula>, a subset <inline-formula><tex-math>$V(e)$</tex-math></inline-formula> of its visible set whose return values cannot be ignored when justifying the return value of <inline-formula><tex-math>$e$</tex-math></inline-formula>. To make it practically feasible, we provide candidates for the visibility and arbitration relations and the <inline-formula><tex-math>$V$</tex-math></inline-formula> function. By combining candidates for these three components, we are able to specify not only existing consistency models but also new ones that are reasonable and promising for practical usefulness. We then show how to specify consistency models in our framework, and provide three case studies.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1338-1353"},"PeriodicalIF":5.6,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Monte: SFCs Migration Scheme in the Distributed Programmable Data Plane 分布式可编程数据平面的sfc迁移方案
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-01-21 DOI: 10.1109/TPDS.2025.3532467
Xiaoquan Zhang;Lin Cui;Fung Po Tso;Yuhui Deng;Zhetao Li;Weijia Jia
{"title":"Monte: SFCs Migration Scheme in the Distributed Programmable Data Plane","authors":"Xiaoquan Zhang;Lin Cui;Fung Po Tso;Yuhui Deng;Zhetao Li;Weijia Jia","doi":"10.1109/TPDS.2025.3532467","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3532467","url":null,"abstract":"Service function chains (SFCs) are sequences of network functions that provide specific services to meet operators’ needs in today's ISPs and datacenter networks. To improve the performance of SFCs, programmable data planes are used to leverage their low latency and high performance packet processing. However, SFCs need to be adaptable to dynamics such as changes in requirements and attributes. Therefore, the ability to migrate SFCs is essential. Unfortunately, migrating SFCs in distributed programmable data planes is challenging due to the risk of degraded performance and failure to meet SFCs requirements and resource constraints in switches. In this paper, we propose <italic>Monte</i>, which provides an effective SFCs migration scheme in distributed programmable data planes. We build a novel integer programming model to represent the migration process with constraints on resource limitations of switches and SFCs attributes in the distributed data plane. Additionally, an SFCs migration algorithm is designed to optimize the migration cost by deeply analyzing resource allocation in the switch pipeline. <italic>Monte</i> has been implemented on both P4 software switches (Bmv2) and hardware switches (Intel Tofino ASIC). Extensive evaluation results show that the migration cost in <italic>Monte</i> is 94.03% lower on average than the state-of-the-art deployment scheme, and <italic>Monte</i> can effectively save pipeline resources.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 4","pages":"633-644"},"PeriodicalIF":5.6,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143535517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaborative Edge-Cloud Data Transfer Optimization for Industrial Internet of Things 面向工业物联网协同边缘云数据传输优化
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-01-21 DOI: 10.1109/TPDS.2025.3532261
Xinchang Zhang;Maoli Wang;Xiaomin Zhu;Zhiwei Yan;Guanggang Geng
{"title":"Collaborative Edge-Cloud Data Transfer Optimization for Industrial Internet of Things","authors":"Xinchang Zhang;Maoli Wang;Xiaomin Zhu;Zhiwei Yan;Guanggang Geng","doi":"10.1109/TPDS.2025.3532261","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3532261","url":null,"abstract":"In the Industrial Internet of Things, it is necessary to reserve enough bandwidth resources according to the maximum traffic peak. However, bandwidth reservation based on the maximum traffic peak leads to low resource utilization. In this paper, we propose a data transfer optimization solution, based on the cooperation of different entities in the local area, which strives to deliver data acquired by sensors to the cloud in a reliable manner and improve bandwidth utilization to save limited network resources. In our solution, the data transfers from the sensors in a local network are controlled by a local controller and some edge gateways with acceptable cost such that no congestion occurs in the path to the cloud and the bandwidth requirement of each flow can be met. To obtain a tradeoff between resource utilization and transfer delay, we study the problem of minimizing the maximum rate peak of periodic real-time traffic from distributed sensors and propose an algorithm to solve this problem with a desirable lower boundary of the performance. In addition, we design an application-level forwarding method that significantly improves resource utilization and a method of implementing reliable sampling instant adjustment. The experimental results show that our solution significantly improves resource utilization without producing network congestion.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 3","pages":"580-597"},"PeriodicalIF":5.6,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Survey on Characterizing and Understanding GNNs From a Computer Architecture Perspective 从计算机体系结构的角度描述和理解gnn的综述
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-01-20 DOI: 10.1109/TPDS.2025.3532089
Meng Wu;Mingyu Yan;Wenming Li;Xiaochun Ye;Dongrui Fan;Yuan Xie
{"title":"Survey on Characterizing and Understanding GNNs From a Computer Architecture Perspective","authors":"Meng Wu;Mingyu Yan;Wenming Li;Xiaochun Ye;Dongrui Fan;Yuan Xie","doi":"10.1109/TPDS.2025.3532089","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3532089","url":null,"abstract":"Characterizing and understanding graph neural networks (GNNs) is essential for identifying performance bottlenecks and facilitating their deployment in parallel and distributed systems. Despite substantial work in this area, a comprehensive survey on characterizing and understanding GNNs from a computer architecture perspective is lacking. This article presents a comprehensive survey, proposing a triple-level classification method to categorize, summarize, and compare existing efforts, particularly focusing on their implications for parallel architectures and distributed systems. We identify promising future directions for GNN characterization that align with the challenges of optimizing hardware and software in parallel and distributed systems. Our survey aims to help scholars systematically understand GNN performance bottlenecks and execution patterns from a computer architecture perspective, thereby contributing to the development of more efficient GNN implementations across diverse parallel architectures and distributed systems.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 3","pages":"537-552"},"PeriodicalIF":5.6,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Note on “AESM2 Attribute-Based Encrypted Search for Multi-Owner and Multi-User Distributed Systems” 关于“基于AESM2属性的多所有者多用户分布式系统加密搜索”的说明
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-01-17 DOI: 10.1109/TPDS.2025.3531446
Zhengjun Cao
{"title":"A Note on “AESM2 Attribute-Based Encrypted Search for Multi-Owner and Multi-User Distributed Systems”","authors":"Zhengjun Cao","doi":"10.1109/TPDS.2025.3531446","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3531446","url":null,"abstract":"We show that the attribute-based encrypted search protocol [IEEE TPDS, 2023, 34(1), 92–107] is insecure against unauthorized user querying attack, because an adversary can convert a valid query from any authorized user into a new legitimate query, while the server cannot detect the fraud.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 4","pages":"675-676"},"PeriodicalIF":5.6,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Response Time Analysis and Optimal Priority Assignment for Global Non-Preemptive Fixed-Priority Rigid Gang Scheduling 全局非抢占固定优先级刚性组调度的响应时间分析与最优优先级分配
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-01-15 DOI: 10.1109/TPDS.2025.3529218
Binqi Sun;Tomasz Kloda;Jiyang Chen;Cen Lu;Marco Caccamo
{"title":"Response Time Analysis and Optimal Priority Assignment for Global Non-Preemptive Fixed-Priority Rigid Gang Scheduling","authors":"Binqi Sun;Tomasz Kloda;Jiyang Chen;Cen Lu;Marco Caccamo","doi":"10.1109/TPDS.2025.3529218","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3529218","url":null,"abstract":"Non-preemptive rigid gang scheduling combines the efficiency of parallel execution with the reduced overhead of non-preemptive scheduling. This approach is particularly advantageous for parallel hardware accelerators, such as Google's Edge Tensor Processing Unit (TPU), which is widely used for deep neural network (DNN) inference on embedded systems. This paper studies sporadic global non-preemptive fixed-priority (NP-FP) rigid gang scheduling, which is well-suited for DNN applications in Edge TPU pipelines. Each gang task spawns a fixed number of threads that must execute concurrently across distinct processing units. We introduce the first carry-in limitation technique specifically designed for gang task response time analysis, addressing the unique challenges posed by intra-task parallelism. This technique is formulated as a generalized knapsack problem, and we develop both a linear programming relaxation and a dynamic programming approach to solve it under different time complexities. Additionally, we propose the first optimal priority assignment policy for NP-FP gang schedulability tests. Our proposed schedulability analysis and optimal priority assignment policy are evaluated through extensive experiments, including both synthetic task sets and a case study using DNN benchmarks on commercial off-the-shelf Edge TPU accelerators. The results demonstrate that the proposed approaches effectively enhance the state-of-the-art global NP-FP gang schedulability tests, achieving improvements of up to 57.9% for synthetic task sets and 76.7% for Edge TPU benchmarks. Furthermore, we conduct an ablations study to examine the impact of different algorithmic components in the proposed technique, providing valuable insights for future research.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 3","pages":"455-470"},"PeriodicalIF":5.6,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840299","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信