IEEE Transactions on Parallel and Distributed Systems最新文献

筛选
英文 中文
Guest Editorial: New Tools and Techniques for the Distributed Computing Continuum 嘉宾评论:分布式计算连续体的新工具和技术
IF 6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-10-16 DOI: 10.1109/TPDS.2025.3612151
Jesus Carretero;Javier García-Blas;Sameer Shende
{"title":"Guest Editorial: New Tools and Techniques for the Distributed Computing Continuum","authors":"Jesus Carretero;Javier García-Blas;Sameer Shende","doi":"10.1109/TPDS.2025.3612151","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3612151","url":null,"abstract":"","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2451-2454"},"PeriodicalIF":6.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11205817","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-Scale Neural Network Quantum States Calculation for Quantum Chemistry on a New Sunway Supercomputer 新型神威超级计算机上量子化学的大规模神经网络量子态计算
IF 6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-10-15 DOI: 10.1109/TPDS.2025.3620251
Yangjun Wu;Wenhao Zhou;Li Shen;Hong Qian;Honghui Shang
{"title":"Large-Scale Neural Network Quantum States Calculation for Quantum Chemistry on a New Sunway Supercomputer","authors":"Yangjun Wu;Wenhao Zhou;Li Shen;Hong Qian;Honghui Shang","doi":"10.1109/TPDS.2025.3620251","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3620251","url":null,"abstract":"Quantum many-body system can be solved with neural-network method. Nonetheless, the practical deployment of neural network quantum states (NNQS) in large-scale electronic structure analyses faces challenges, chiefly the high sampling cost and the complexity of local energy computations. To overcome these computational barriers, we present an innovative data-parallel NNQS-Transformer implementation. This implementation introduces a hybrid multi-layer workload balancing strategy that effectively addresses previous load imbalance issues while leveraging Julia’s portability to achieve targeted performance optimizations. Through extensive testing, we validate our approach using comprehensive quantum chemistry calculations on systems containing up to 120 spin orbitals, where previous methods were limited to much smaller scales. The implementation demonstrates exceptional scalability on the Sunway platform, achieving 92% strong scaling and 98% weak scaling efficiencies when utilizing up to 37 million processor cores. These significant performance improvements mark a crucial step toward making NNQS calculations practical for real-world quantum chemistry applications.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2724-2732"},"PeriodicalIF":6.0,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chorus: Robust Multitasking Local Client-Server Collaborative Inference With Wi-Fi 6 for AIoT Against Stochastic Congestion Delay 基于Wi-Fi 6的AIoT随机拥塞延迟鲁棒多任务本地客户端-服务器协同推理
IF 6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-10-09 DOI: 10.1109/TPDS.2025.3619775
Yuzhe Luo;Ji Qi;Ling Li;Ruizhi Chen;Xiaoyu Wu;Limin Cheng;Chen Zhao
{"title":"Chorus: Robust Multitasking Local Client-Server Collaborative Inference With Wi-Fi 6 for AIoT Against Stochastic Congestion Delay","authors":"Yuzhe Luo;Ji Qi;Ling Li;Ruizhi Chen;Xiaoyu Wu;Limin Cheng;Chen Zhao","doi":"10.1109/TPDS.2025.3619775","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3619775","url":null,"abstract":"The rapid growth of AIoT devices brings huge demands for DNNs deployed on resource-constrained devices. However, the intensive computation and high memory footprint of DNN inference make it difficult for the AIoT devices to execute the inference tasks efficiently. In many widely deployed AIoT use cases, multiple local AIoT devices launch DNN inference tasks randomly. Although local collaborative inference has been proposed to accelerate DNN inference on local devices with limited resources, multitasking local collaborative inference, which is common in AIoT scenarios, has not been fully studied in previous works. We consider multitasking local client-server collaborative inference (MLCCI), which achieves efficient DNN inference by offloading the inference tasks from multiple AIoT devices to a more powerful local server with parallel pipelined execution streams through Wi-Fi 6. Our optimization goal is to minimize the mean end-to-end latency of MLCCI. Based on the experiment results, we identify three key challenges: high communication costs, high model initialization latency, and congestion delay brought by task interference. We analyze congestion delay in MLCCI and its stochastic fluctuations with queuing theory and propose Chorus, a high-performance adaptive MLCCI framework for AIoT devices, to minimize the mean end-to-end latency of MLCCI against stochastic congestion delay. Chorus generates communication-efficient model partitions with heuristic search, uses a prefetch-enabled two-level LRU cache to accelerate model initialization on the server, reduces congestion delay and its short-term fluctuations with execution stream allocation based on the cross-entropy method, and finally achieves efficient computation offloading with reinforcement learning. We established a system prototype, which statistically simulated many virtual clients with limited physical client devices to conduct performance evaluations, for Chorus with real devices. The evaluation results for various workload levels show that Chorus achieved an average of <inline-formula><tex-math>$1.4times$</tex-math></inline-formula>, <inline-formula><tex-math>$1.3times$</tex-math></inline-formula>, and <inline-formula><tex-math>$2times$</tex-math></inline-formula> speedup over client-only inference, and server-only inference with LRU and MLSH, respectively.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2706-2723"},"PeriodicalIF":6.0,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSpMM: Efficiently Scalable SpMM Kernels Across Multiple Generations of Tensor Cores SSpMM:跨多代张量核的有效可扩展的SpMM内核
IF 6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-10-01 DOI: 10.1109/TPDS.2025.3616981
Zeyu Xue;Mei Wen;Jianchao Yang;Minjin Tang;Zhongdi Luo;Jing Feng;Yang Shi;Zhaoyun Chen;Junzhong Shen;Johannes Langguth
{"title":"SSpMM: Efficiently Scalable SpMM Kernels Across Multiple Generations of Tensor Cores","authors":"Zeyu Xue;Mei Wen;Jianchao Yang;Minjin Tang;Zhongdi Luo;Jing Feng;Yang Shi;Zhaoyun Chen;Junzhong Shen;Johannes Langguth","doi":"10.1109/TPDS.2025.3616981","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3616981","url":null,"abstract":"Sparse-Dense Matrix-Matrix Multiplication (SpMM) has emerged as a foundational primitive in HPC and AI. Recent advancements have aimed to accelerate SpMM by harnessing the powerful Tensor Cores found in modern GPUs. However, despite these efforts, existing methods frequently encounter performance degradation when ported across different Tensor Core architectures. Recognizing that scalable SpMM across multiple generations of Tensor Cores relies on the effective use of general-purpose instructions, we have meticulously developed a SpMM library named <italic>SSpMM</i>. However, a significant conflict exists between granularity and performance in current Tensor Core instructions. To resolve this, we introduce the innovative <italic>Transpose Mapping Scheme</i>, which elegantly implements fine-grained kernels using coarse-grained instructions. Additionally, we propose the <italic>Register Shuffle Method</i> to further enhance performance. Finally, we introduce <italic>Sparse Vector Compression</i>, a technique that ensures our kernels are scalable with both structured and unstructured sparsity. Our experimental results, conducted on four generations of Tensor Core GPUs using over 3,000 sparse matrices from well-established matrix collections, demonstrate that <italic>SSpMM</i> achieves an average speedup of 2.04 ×, 2.81 ×, 2.07 ×, and 1.87 ×, respectively, over the state-of-the-art SpMM solution. Furthermore, we have integrated <italic>SSpMM</i> into PyTorch, achieving a 1.81 × speedup in end-to-end Transformer inference compared to <italic>cuDNN</i>.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2652-2667"},"PeriodicalIF":6.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Art of the Fugue: Minimizing Interleaving in Collaborative Text Editing 赋格的艺术:在协同文本编辑中尽量减少交错
IF 6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-09-26 DOI: 10.1109/TPDS.2025.3611880
Matthew Weidner;Martin Kleppmann
{"title":"The Art of the Fugue: Minimizing Interleaving in Collaborative Text Editing","authors":"Matthew Weidner;Martin Kleppmann","doi":"10.1109/TPDS.2025.3611880","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3611880","url":null,"abstract":"Most existing algorithms for replicated lists, which are widely used in collaborative text editors, suffer from a problem: when two users concurrently insert text at the same position in the document, the merged outcome may interleave the inserted text passages, resulting in corrupted and potentially unreadable text. The problem has gone unnoticed for decades, and it affects both CRDTs and Operational Transformation. This paper defines maximal non-interleaving, our new correctness property for replicated lists. We introduce two related CRDT algorithms, Fugue and FugueMax, and prove that FugueMax satisfies maximal non-interleaving. We also implement our algorithms and demonstrate that Fugue offers performance comparable to state-of-the-art CRDT libraries for text editing.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 11","pages":"2425-2437"},"PeriodicalIF":6.0,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Atomic Smart Contract Interoperability With High Efficiency via Cross-Chain Integrated Execution 通过跨链集成执行实现高效的原子智能合约互操作性
IF 6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-09-25 DOI: 10.1109/TPDS.2025.3614374
Chaoyue Yin;Mingzhe Li;Jin Zhang;You Lin;Qingsong Wei;Siow Mong Rick Goh
{"title":"Atomic Smart Contract Interoperability With High Efficiency via Cross-Chain Integrated Execution","authors":"Chaoyue Yin;Mingzhe Li;Jin Zhang;You Lin;Qingsong Wei;Siow Mong Rick Goh","doi":"10.1109/TPDS.2025.3614374","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3614374","url":null,"abstract":"With the development of Ethereum, numerous blockchains compatible with Ethereum’s execution environment (i.e., Ethereum Virtual Machine, EVM) have emerged. Developers can leverage smart contracts to run various complex decentralized applications on top of blockchains. However, the increasing number of EVM-compatible blockchains has introduced significant challenges in cross-chain interoperability, particularly in ensuring efficiency and atomicity for the whole cross-chain application. Existing solutions are <italic>either limited in guaranteeing overall atomicity for the cross-chain application, or inefficient due to the need for multiple rounds of cross-chain smart contract execution.</i> To address this gap, we propose <monospace>IntegrateX</monospace>, an efficient cross-chain interoperability system that ensures the overall atomicity of cross-chain smart contract invocations. The core idea is to <italic>deploy the logic required for cross-chain execution onto a single blockchain, where it can be executed in an integrated manner.</i> This allows cross-chain applications to perform all cross-chain logic efficiently within the same blockchain. <monospace>IntegrateX</monospace> consists of a <italic>cross-chain smart contract deployment protocol</i> and a <italic>cross-chain smart contract integrated execution protocol.</i> The former achieves efficient and secure cross-chain deployment by decoupling smart contract logic from state, and employing an off-chain cross-chain deployment mechanism combined with on-chain cross-chain verification. The latter ensures atomicity of cross-chain invocations through a 2PC-based mechanism, and enhances performance through transaction aggregation and fine-grained state lock. We implement a prototype of <monospace>IntegrateX</monospace>. Extensive experiments demonstrate that it reduces up to 61.2% latency compared to the state-of-the-art baseline while maintaining low gas consumption.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2635-2651"},"PeriodicalIF":6.0,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Megapixel Approach for Efficient Execution of Irregular Wavefront Algorithms on GPUs 在gpu上高效执行不规则波前算法的百万像素方法
IF 6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-09-23 DOI: 10.1109/TPDS.2025.3612696
Mathias Oliveira;Willian Barreiros;Renato Ferreira;Alba C. M. A. Melo;George Teodoro
{"title":"The Megapixel Approach for Efficient Execution of Irregular Wavefront Algorithms on GPUs","authors":"Mathias Oliveira;Willian Barreiros;Renato Ferreira;Alba C. M. A. Melo;George Teodoro","doi":"10.1109/TPDS.2025.3612696","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3612696","url":null,"abstract":"Morphological operations are critical in high-resolution biomedical image processing. Their efficient execution relies on an irregular flood-filling strategy consolidated in the Irregular Wavefront Propagation Pattern (IWPP). IWPP was designed for GPUs and achieved significant gains compared to previous work. Here, however, we have revisited IWPP to identify the key limitations of its GPU implementation and proposed a novel more efficient strategy. In particular, the IWPP most demanding phase consists of tracking active pixels, those contributing to the output, that are the ones processed during the execution. This computational strategy leads to irregular memory access, divergent execution, and high storage (queue) management costs. To address these aspects, we have proposed the novel execution strategy called Irregular Wavefront Megapixel Propagation Pattern (IWMPP). IWMPP introduces a coarse-grained execution approach based on fixed-size square regions (instead of pixels in IWPP), referred to as megapixels (MPs). This design reduces the number of elements tracked and enables a regular processing within MPs that, in turn, improves thread divergence and memory accesses. IWMPP introduces optimizations, such as Duplicate Megapixel Removal (DMR) to avoid MPs recomputation and Tiled-Ordered (TO) execution that enforces a semistructured MPs execution sequence to improve data propagation efficiency. Experimental results using large tissue cancer images demonstrated that the IWMPP GPU attains significant gains over the state-of-the-art (IWPP). For morphological reconstruction, fill holes, and h-maxima operations, on the RTX 4090, the IWMPP GPU is up to 17.9×, 45.6×, and 14.9× faster than IWPP GPU, respectively, while at the same time reducing memory demands. IWMPP is an important step to enable quick processing of large imaging datasets.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 11","pages":"2399-2411"},"PeriodicalIF":6.0,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11176841","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Data Locality by Integrating Intermediate Data Partitioning and Reduce Task Scheduling in Spark Framework 在Spark框架中集成中间数据分区和减少任务调度来优化数据局部性
IF 6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-09-18 DOI: 10.1109/TPDS.2025.3611388
Mengsi He;Zhongming Fu;Zhuo Tang
{"title":"Optimizing Data Locality by Integrating Intermediate Data Partitioning and Reduce Task Scheduling in Spark Framework","authors":"Mengsi He;Zhongming Fu;Zhuo Tang","doi":"10.1109/TPDS.2025.3611388","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3611388","url":null,"abstract":"Data locality is crucial for distributed computing systems (e.g., Spark and Hadoop), which is the main factor considered in the task scheduling. Simultaneously, the effects of data locality on reduce tasks are determined by the intermediate data partitioning. While suffering from the problem of data skew, the existing intermediate data partitioning methods only achieves load balancing for reduce tasks. To address the problem, this paper optimizes the data locality for reduce tasks by integrating intermediate data partitioning and task scheduling in Spark framework. First, it presents a distribution skew model to divide the key clusters into skewed and non-skewed distribution. Then, a data locality and load balancing-aware intermediate data partitioning method is proposed, where a priority allocation strategy for the key clusters with skewed distribution is presented, and a balanced allocation strategy for the key clusters with non-skewed distribution is presented. Finally, it proposes a data locality-aware reduce task scheduling algorithm, where an online self-adaptive NARX (nonlinear autoregressive with external input) model is developed to predict the idle time of node. It can ensure that the delayed scheduling decision made can complete the data transmission of reduce tasks earlier. We implement our proposals in Spark-3.5.1 and evaluate the performance using several representative benchmarks. Experimental results indicate that the proposed method and algorithm can reduce the job/application running time by approximately 4% to 46% and decrease the total volume of data transmission by approximately 8% to 54%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 11","pages":"2383-2398"},"PeriodicalIF":6.0,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedSR: A Semi-Decentralized Federated Learning Framework for Non-IID Data Based on Incremental Subgradient Optimization FedSR:基于增量子梯度优化的非iid数据半分散联邦学习框架
IF 6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-09-17 DOI: 10.1109/TPDS.2025.3611304
Jianjun Huang;Hao Huang;Li Kang;Lixin Ye
{"title":"FedSR: A Semi-Decentralized Federated Learning Framework for Non-IID Data Based on Incremental Subgradient Optimization","authors":"Jianjun Huang;Hao Huang;Li Kang;Lixin Ye","doi":"10.1109/TPDS.2025.3611304","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3611304","url":null,"abstract":"In the Industrial Internet of Things (IoT), data heterogeneity across different devices poses a huge challenge to federated learning techniques, significantly reducing the performance of federated learning models. Additionally, the large number of devices participating in IoT federated learning and training imposes a substantial computational burden on cloud servers. Current federated learning research primarily adopts centralized or discentralized learning architectures, which cannot fundamentally solve these issues. To address this, we propose a novel semi-centralized cloud-edge-device hierarchical federate learning framework that integrated both centralized and decentralized federated learning approaches. Specifically, only a subset of adjacent devices forms small-scale ring clusters, and the cloud server aggregates the ring models to construct a global model. To mitigate the impact of data heterogeneity across devices, we use an incremental subgradient optimization algorithm within each ring cluster to enhance the generalization ability of the ring cluster models. Extensive experiments demonstrate that our approach effectively reduces the impact of data heterogeneity, improves model performance, and significantly alleviates the communication burden on cloud servers compared to centralized and discentralized federated learning frameworks. Indeed, the framework proposed in this paper aims to balance the strengths of centralized federated learning and ring federated learning. It achieves superior performance in addressing the data non-IID problem compared to centralized federated learning architectures while also mitigating issues associated with excessively large rings in ring architectures.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2693-2705"},"PeriodicalIF":6.0,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New Scheduling Algorithm and Analysis for Partitioned Periodic DAG Tasks on Multiprocessors 多处理机分区周期DAG任务调度新算法及分析
IF 6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-09-17 DOI: 10.1109/TPDS.2025.3611446
Haochun Liang;Xu Jiang;Junyi Liu;Xiantong Luo;Songran Liu;Nan Guan;Wang Yi
{"title":"New Scheduling Algorithm and Analysis for Partitioned Periodic DAG Tasks on Multiprocessors","authors":"Haochun Liang;Xu Jiang;Junyi Liu;Xiantong Luo;Songran Liu;Nan Guan;Wang Yi","doi":"10.1109/TPDS.2025.3611446","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3611446","url":null,"abstract":"Real-time systems are increasingly shifting from single processors to multiprocessors, where software must be parallelized to fully exploit the additional computational power. While the scheduling of real-time parallel tasks modeled as directed acyclic graphs (DAGs) has been extensively studied in the context of global scheduling, the scheduling and analysis of real-time DAG tasks under partitioned scheduling remain far less developed compared to the traditional scheduling of sequential tasks. Existing approaches primarily target plain fixed-priority partitioned scheduling and often rely on self-suspension–based analysis, which limits opportunities for further optimization. In particular, such methods fail to fully leverage fine-grained scheduling management that could improve schedulability. In this paper, we propose a novel approach for scheduling periodic DAG tasks, in which each DAG task is transformed into a set of real-time transactions by incorporating mechanisms for enforcing release offsets and intra-task priority assignments. We further develop corresponding analysis techniques and partitioning algorithms. Through comprehensive experiments, we evaluate the real-time performance of the proposed methods against state-of-the-art scheduling and analysis techniques. The results demonstrate that our approach consistently outperforms existing methods for scheduling periodic DAG tasks across a wide range of parameter settings.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2621-2634"},"PeriodicalIF":6.0,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信