IEEE Transactions on Parallel and Distributed Systems最新文献

筛选
英文 中文
Beehive: Decentralised High-Frequency Small Tasks Scheduling in Large Clusters Beehive:大型集群中分散的高频小任务调度
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-22 DOI: 10.1109/TPDS.2025.3563457
Yuxia Cheng;Linfeng Xu;Tongkai Yang;Wei Wu;Zhiqiang Lin;Antong Yu;Wenzhi Chen
{"title":"Beehive: Decentralised High-Frequency Small Tasks Scheduling in Large Clusters","authors":"Yuxia Cheng;Linfeng Xu;Tongkai Yang;Wei Wu;Zhiqiang Lin;Antong Yu;Wenzhi Chen","doi":"10.1109/TPDS.2025.3563457","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3563457","url":null,"abstract":"Data centers struggle with growing cluster sizes and rising submissions of short-lived, high-frequency tasks that cause performance bottlenecks in task scheduling. Existing centralized and distributed scheduling systems fall short in meeting performance requirements due to computational overload on the scheduler, cluster state management overhead, and scheduling conflicts. To address these challenges, this article introduces Beehive, a novel lightweight decentralized scheduling framework. In Beehive, each cluster node can schedule tasks within its local neighborhood, effectively reducing resource management overhead and scheduling conflicts. Moreover, all nodes are interconnected in a small-world network, an efficient structure that allows tasks to access resources across the entire cluster through global routing. This lightweight design enables Beehive to scale efficiently, supporting over 10,000 nodes and up to 80,000 task submissions per second without causing single-node scheduling bottlenecks. Experimental results demonstrate that Beehive significantly reduces scheduling latency. Specifically, 99% of tasks are scheduled within 100 milliseconds, and scheduling throughput can increase linearly with the number of nodes. Compared to existing centralized and distributed scheduling frameworks, Beehive substantially alleviates scheduling bottlenecks, particularly for high-frequency, short-lived tasks.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1326-1337"},"PeriodicalIF":5.6,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143925044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
$AWB^+$AWB+-$Tree$Tree: A Novel Width-Based Index Structure Supporting Hybrid Matching for Large-Scale Content-Based Pub/Sub Systems $AWB^+$AWB+-$Tree:一种支持大规模基于内容的Pub/Sub系统混合匹配的基于宽度的索引结构
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-16 DOI: 10.1109/TPDS.2025.3561714
Zhengyu Liao;Shiyou Qian;Zhonglong Zheng;Jian Cao;Guangtao Xue;Minglu Li
{"title":"$AWB^+$AWB+-$Tree$Tree: A Novel Width-Based Index Structure Supporting Hybrid Matching for Large-Scale Content-Based Pub/Sub Systems","authors":"Zhengyu Liao;Shiyou Qian;Zhonglong Zheng;Jian Cao;Guangtao Xue;Minglu Li","doi":"10.1109/TPDS.2025.3561714","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3561714","url":null,"abstract":"Event matching is a key component in a large-scale content-based publish/subscribe system. The performance of most existing algorithms is easily affected by the subscription matching probability. In this article, we propose a new data structure, named <inline-formula><tex-math>$AWB^+$</tex-math></inline-formula>-<inline-formula><tex-math>$Tree$</tex-math></inline-formula>, which is based on the width of the predicates, to efficiently index the subscriptions. The most notable feature of <inline-formula><tex-math>$AWB^+$</tex-math></inline-formula>-<inline-formula><tex-math>$Tree$</tex-math></inline-formula> is its ability to combine the advantages of different matching methods, thus achieving high and robust performance in dynamic environments. First, we implement both a forward matching method (AFM) and a backward matching method (ABM) based on <inline-formula><tex-math>$AWB^+$</tex-math></inline-formula>-<inline-formula><tex-math>$Tree$</tex-math></inline-formula>. Then, we introduce a hybrid matching method (AHM) that combines AFM and ABM. Moreover, we extend <inline-formula><tex-math>$AWB^+$</tex-math></inline-formula>-<inline-formula><tex-math>$Tree$</tex-math></inline-formula> in three aspects: approximate matching, string type matching, and fine-grained parallelization. We conducted extensive experiments to evaluate the performance of the proposed matching algorithms on synthetic and real-world datasets. The experiment results reveal that AHM achieves a reduction in matching time by up to 53.8% compared to the state-of-the-art method. Additionally, AHM exhibits improved performance robustness, with up to a 76.9% reduction in terms of the standard deviation of matching time. Particularly in dynamic scenarios, AHM is at least 2.3 times faster and 41.3% more stable than its counterparts. Furthermore, by implementing parallelization, the matching speed of 8 threads can be accelerated by 4.16 times compared to the single-thread matching speed.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1268-1281"},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Raccoon: Lightweight Support for Comprehensive Control Flows in Reconfigurable Spatial Architectures 浣熊:在可重构空间架构中对综合控制流的轻量级支持
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-15 DOI: 10.1109/TPDS.2025.3561145
Xiangyu Kong;Yi Huang;Longlong Chen;Jianfeng Zhu;Liangwei Li;Xingchen Man;Mingyu Gao;Shaojun Wei;Leibo Liu
{"title":"Raccoon: Lightweight Support for Comprehensive Control Flows in Reconfigurable Spatial Architectures","authors":"Xiangyu Kong;Yi Huang;Longlong Chen;Jianfeng Zhu;Liangwei Li;Xingchen Man;Mingyu Gao;Shaojun Wei;Leibo Liu","doi":"10.1109/TPDS.2025.3561145","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3561145","url":null,"abstract":"Coarse-grained reconfigurable arrays (CGRAs) have emerged as promising candidates for digital signal processing, biomedical, and automotive applications, where energy efficiency and flexibility are paramount. Yet existing CGRAs suffer from the Amdahl bottleneck caused by constrained control handling via either off-device communication or expensive tag-matching mechanisms. More importantly, mapping control flow onto CGRAs is extremely arduous and time-consuming due to intricate instruction structures and hardware mechanisms. To counteract these limitations, we propose Raccoon, a portable and lightweight framework for CGRAs targeting vast control flows. Raccoon comprises a comprehensive approach that spans microarchitecture, HW/SW interface, and compiler aspects. Regarding microarchitecture, Raccoon incorporates specialized infrastructure for branch- and loop-level control patterns with concise execution mechanisms. The HW/SW interface of Raccoon includes well-characterized abstractions and instruction sets tailored for easy compilation, featuring custom operators and architectural models for control-oriented units. On the compiler front, Raccoon integrates advanced control handling techniques and employs a portable mapper leveraging reinforcement learning and Monte Carlo tree search. This enables agile mapping and optimization of the entire program, ensuring efficient execution and high-quality results. Through the cohesive co-design, Raccoon can empower various CGRAs with robust control-flow handling capabilities, surpassing conventional tagged mechanisms in terms of hardware efficiency and compiler adaptability. Evaluation results show that Raccoon achieves up to a 5.78× improvement in energy efficiency and a 2.24× reduction in cycle count over state-of-the-art CGRAs. Raccoon stands out for its versatility in managing intricate control flows and showcases remarkable portability across diverse CGRA architectures.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1294-1310"},"PeriodicalIF":5.6,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ChunkFunc: Dynamic SLO-Aware Configuration of Serverless Functions ChunkFunc:无服务器功能的动态慢速感知配置
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-09 DOI: 10.1109/TPDS.2025.3559021
Thomas Pusztai;Stefan Nastic
{"title":"ChunkFunc: Dynamic SLO-Aware Configuration of Serverless Functions","authors":"Thomas Pusztai;Stefan Nastic","doi":"10.1109/TPDS.2025.3559021","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3559021","url":null,"abstract":"Serverless computing promises to be a cost effective form of on demand computing. To fully utilize its cost saving potential, workflows must be configured with the appropriate amount of resources to meet their response time Service Level Objective (SLO), while keeping costs at a minimum. Since determining and updating these configuration models manually is a nontrivial and error prone task, researchers have developed solutions for automatically finding configurations that meet the aforementioned requirements. However, our initial experiments show that even when following best practices and using state-of-the-art configuration tools, resources may still be considerably over- or underprovisioned, depending on the size of functions’ input payload. In this paper we present ChunkFunc, an SLO- and input data-aware framework for tuning serverless workflows. Our main contributions include: i) an SLO- and input size-aware function performance model for optimized configurations in serverless workflows, ii) ChunkFunc Profiler, an auto-tuned, Bayesian Optimization-guided profiling mechanism for profiling serverless functions with typical input data sizes to build a performance model, and iii) ChunkFunc Workflow Optimizer, which uses these models to determine an input size dependent configuration for each serverless function in a workflow to meet the SLO, while keeping costs to a minimum. We evaluate ChunkFunc on real-life serverless workflows and compare it to two state-of-the-art solutions, showing that it increases SLO adherence by a factor of 1.04 to 2.78, depending on the workflow, and reduces costs by up to 61% .","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1237-1252"},"PeriodicalIF":5.6,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10959103","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143871095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Productivity, Portability, Performance, and Reproducibility: Data-Centric Python 生产力、可移植性、性能和再现性:以数据为中心的Python
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-09 DOI: 10.1109/TPDS.2025.3549310
Alexandros Nikolaos Ziogas;Timo Schneider;Tal Ben-Nun;Alexandru Calotoiu;Tiziano De Matteis;Johannes de Fine Licht;Luca Lavarini;Torsten Hoefler
{"title":"Productivity, Portability, Performance, and Reproducibility: Data-Centric Python","authors":"Alexandros Nikolaos Ziogas;Timo Schneider;Tal Ben-Nun;Alexandru Calotoiu;Tiziano De Matteis;Johannes de Fine Licht;Luca Lavarini;Torsten Hoefler","doi":"10.1109/TPDS.2025.3549310","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3549310","url":null,"abstract":"Python has become the <italic>de facto</i> language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python support in High-Performance Computing (HPC) has skyrocketed. However, the Python language itself does not necessarily offer high performance. This work presents a workflow that retains Python’s high productivity while achieving portable performance across different architectures. The workflow’s key features are HPC-oriented language extensions and a set of automatic optimizations powered by a data-centric intermediate representation. We show performance results and scaling across CPU, GPU, FPGA, and the Piz Daint supercomputer (up to 23,328 cores), with 2.47x and 3.75x speedups over previous-best solutions, first-ever Xilinx and Intel FPGA results of annotated Python, and up to 93.16% scaling efficiency on 512 nodes. Our benchmarks were reproduced in the Student Cluster Competition (SCC) during the Supercomputing Conference (SC) 2022. We present and discuss the student teams’ results.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 5","pages":"804-820"},"PeriodicalIF":5.6,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143808917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial:Special Section on SC22 Student Cluster Competition 嘉宾评论:SC22学生分组比赛专题
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-09 DOI: 10.1109/TPDS.2025.3549281
Omer Rana;Josef Spillner;Stephen Leak;Gerald F Lofstead II;Rafael Tolosana Calasanz
{"title":"Guest Editorial:Special Section on SC22 Student Cluster Competition","authors":"Omer Rana;Josef Spillner;Stephen Leak;Gerald F Lofstead II;Rafael Tolosana Calasanz","doi":"10.1109/TPDS.2025.3549281","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3549281","url":null,"abstract":"","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 5","pages":"803-803"},"PeriodicalIF":5.6,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10960278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143808845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Symmetric Properties and Two Variants of Shuffle-Cubes 洗牌立方体的对称性质和两种变体
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-08 DOI: 10.1109/TPDS.2025.3558885
Huazhong Lü;Kai Deng;Xiaomei Yang
{"title":"Symmetric Properties and Two Variants of Shuffle-Cubes","authors":"Huazhong Lü;Kai Deng;Xiaomei Yang","doi":"10.1109/TPDS.2025.3558885","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3558885","url":null,"abstract":"Li et al. in [Inf. Process. Lett. 77 (2001) 35–41] proposed the shuffle-cube &lt;inline-formula&gt;&lt;tex-math&gt;$SQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt;, a hypercube variant, as an attractive interconnection network topology for massive parallel and distributed systems. Diameter and symmetry are two desirable measures of network performance in terms of transmission delay and routing algorithms. Almost all &lt;inline-formula&gt;&lt;tex-math&gt;$n$&lt;/tex-math&gt;&lt;/inline-formula&gt;-regular hypercube variants of dimension &lt;inline-formula&gt;&lt;tex-math&gt;$n$&lt;/tex-math&gt;&lt;/inline-formula&gt; have diameter not less than &lt;inline-formula&gt;&lt;tex-math&gt;$n/2$&lt;/tex-math&gt;&lt;/inline-formula&gt;. The diameter of the shuffle-cube is approximately a quarter of the diameter of the hypercube of the same dimension, making it a competitive candidate network topology. By far, symmetric properties of the shuffle-cube remain unknown. In this paper, we show that &lt;inline-formula&gt;&lt;tex-math&gt;$SQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; is not vertex-transitive for &lt;inline-formula&gt;&lt;tex-math&gt;$n&gt; 2$&lt;/tex-math&gt;&lt;/inline-formula&gt;, which is not an appealing property in interconnection networks. This shortcoming limits the practical application of the shuffle-cube. To overcome this limitation, two novel variants of the shuffle-cube, namely simplified shuffle-cube &lt;inline-formula&gt;&lt;tex-math&gt;$SSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; and balanced shuffle-cube &lt;inline-formula&gt;&lt;tex-math&gt;$BSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; are introduced, and their vertex-transitivity are proved simultaneously. By proposing the shuffle-cube-like graph, we obtain that both &lt;inline-formula&gt;&lt;tex-math&gt;$SSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; and &lt;inline-formula&gt;&lt;tex-math&gt;$BSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; are maximally connected, implying high connectivity similar to the hypercube. Additionally, super-connectivity, a refined parameter of connectivity, of &lt;inline-formula&gt;&lt;tex-math&gt;$SSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; and &lt;inline-formula&gt;&lt;tex-math&gt;$BSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; are also determined. Then, by vertex-transitivity of &lt;inline-formula&gt;&lt;tex-math&gt;$SSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; and &lt;inline-formula&gt;&lt;tex-math&gt;$BSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt;, routing algorithms of &lt;inline-formula&gt;&lt;tex-math&gt;$SSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; and &lt;inline-formula&gt;&lt;tex-math&gt;$BSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; are given for all &lt;inline-formula&gt;&lt;tex-math&gt;$n&gt; 2$&lt;/tex-math&gt;&lt;/inline-formula&gt; respectively. We show that both &lt;inline-formula&gt;&lt;tex-math&gt;$SSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; and &lt;inline-formula&gt;&lt;tex-math&gt;$BSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; possess Hamiltonian cycle embedding for all &lt;inline-formula&gt;&lt;tex-math&gt;$n&gt; 2$&lt;/tex-math&gt;&lt;/inline-formula&gt;, and we also show that &lt;inline-formula&gt;&lt;tex-math&gt;$SSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; is Hamiltonian-connected. It is noticeable that each vertex of &lt;inline-formula&gt;&lt;tex-math&gt;$SSQ_{n}$&lt;/tex-math&gt;&lt;/inline-formula&gt; is contained in exactly one clique of size four, making it also a viable interconnection topo","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1282-1293"},"PeriodicalIF":5.6,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cube-fx: Mapping Taylor Expansion Onto Matrix Multiplier-Accumulators of Huawei Ascend AI Processors Cube-fx:将泰勒展开映射到华为Ascend AI处理器的矩阵乘法器-累加器上
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-03 DOI: 10.1109/TPDS.2025.3557444
Yifeng Tang;Huaman Zhou;Zhuoran Ji;Cho-Li Wang
{"title":"Cube-fx: Mapping Taylor Expansion Onto Matrix Multiplier-Accumulators of Huawei Ascend AI Processors","authors":"Yifeng Tang;Huaman Zhou;Zhuoran Ji;Cho-Li Wang","doi":"10.1109/TPDS.2025.3557444","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3557444","url":null,"abstract":"Taylor expansion, a mature method for function evaluations used in Artificial Intelligence (AI) applications, approximates functions with polynomials. In addition to the function evaluations, AI applications require massive matrix multiplications, inspiring manufacturers to propose AI processors with matrix multiplier-accumulators (MACs). However, compared with the powerful Matrix MACs, the vectorized units of the AI processors cannot efficiently carry the existing Taylor expansion implementation of Single Instruction Multiple Data (SIMD) parallelism. Leveraging the Matrix MACs for Taylor expansion becomes an ideal direction. In previous studies, migrating optimized algorithms to the Matrix MACs requires matrix generation during the runtime. The generation is expensive and even cancels the accelerations brought by the Matrix MACs on the AI processors, which Taylor expansion also suffers. This article presents Cube-fx, a mapping algorithm of Taylor expansion for multiple functions onto Matrix MACs. Cube-fx expresses the building and computation in matrix multiplications without inefficient dynamic matrix generation. On Huawei Ascend processors, Cube-fx averagely achieves 1.64× speedups compared with vectorized Horner’s Method with 56.38<inline-formula><tex-math>$%$</tex-math></inline-formula> vectorized operations reduced.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1115-1129"},"PeriodicalIF":5.6,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed and Adaptive Partitioning for Large Graphs in Geo-Distributed Data Centers 地理分布数据中心中大图的分布式和自适应分区
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-03 DOI: 10.1109/TPDS.2025.3557610
Haobin Tan;Yao Xiao;Amelie Chi Zhou;Kezhong Lu;Xuan Yang
{"title":"Distributed and Adaptive Partitioning for Large Graphs in Geo-Distributed Data Centers","authors":"Haobin Tan;Yao Xiao;Amelie Chi Zhou;Kezhong Lu;Xuan Yang","doi":"10.1109/TPDS.2025.3557610","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3557610","url":null,"abstract":"Graph partitioning is of great importance to optimizing the performance and cost of geo-distributed graph analytics applications. However, it is non-trivial to obtain efficient and effective partitioning due to the challenges brought by the <italic>large graph scales</i>, <italic>dynamic graph changes</i> and the <italic>network heterogeneity</i> in geo-distributed data centers (DCs). Existing studies usually adopt heuristic-based methods to achieve fast and balanced partitioning for large graphs, which are not powerful enough to address the complexity in our problem. Further, graph structures of many applications can change at various frequencies. Dynamic partitioning methods usually focus on achieving low latency to quickly adapt to changes, which unfortunately sacrifices partitioning effectiveness. Also, such methods are not aware of the dynamicity of graphs and can over sacrifice effectiveness for unnecessarily low latency. To address the limitations of existing studies, we propose <italic>DistRLCut</i>, a novel graph partitioner which leverages Multi-Agent Reinforcement Learning (MARL) to solve the complexity of the partitioning problem. To achieve fast partitioning for large graphs, <italic>DistRLCut</i> adapts MARL to a distributed implementation which significantly accelerates the learning process. Further, <italic>DistRLCut</i> incorporates two techniques to trade-off between partitioning effectiveness and efficiency, including local training and agent sampling. By adaptively tuning the number of local training iterations and the agent sampling rate, <italic>DistRLCut</i> is able to achieve good partitioning results within an overhead constraint required by graph dynamicity. Experiments using real cloud DCs and real-world graphs show that, compared to state-of-the-art static partitioning methods, <italic>DistRLCut</i> improves the performance of geo-distributed graph analytics by 11%-95%. <italic>DistRLCut</i> can partition over 28 million edges per second, showcasing its scalability for large graphs. With varying graph changing frequencies, <italic>DistRLCut</i> can improve the performance by up to 71% compared to state-of-the-art dynamic partitioning.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1161-1174"},"PeriodicalIF":5.6,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OneOS: Distributed Operating System for the Edge-to-Cloud Continuum OneOS:用于边缘到云连续体的分布式操作系统
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-03 DOI: 10.1109/TPDS.2025.3557747
Kumseok Jung;Julien Gascon-Samson;Sathish Gopalakrishnan;Karthik Pattabiraman
{"title":"OneOS: Distributed Operating System for the Edge-to-Cloud Continuum","authors":"Kumseok Jung;Julien Gascon-Samson;Sathish Gopalakrishnan;Karthik Pattabiraman","doi":"10.1109/TPDS.2025.3557747","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3557747","url":null,"abstract":"Application developers often need to employ a combination of software such as communication middleware and cloud-based services to deal with the challenges of heterogeneity and network dynamism in the edge-to-cloud continuum. Consequently, developers write extra glue code peripheral to the application’s core business logic, to provide interoperability between interacting software frameworks. Each software framework comes with its own framework-specific API, and as technology evolves, the developer must keep up with the changing APIs by updating the glue code in their application. Thus, framework-specific APIs hinder interoperability and cause technology fragmentation. We propose a design of a middleware-based distributed operating system (OS) called OneOS to realize a computing paradigm that alleviates such interoperability challenges. OneOS provides a single system image of the distributed computing platform, and transparently provides interoperability between software components through the standard POSIX API. Using OneOS’s domain-specific language, users can compose complex distributed applications from legacy POSIX programs. OneOS tolerates failures by adopting a distributed checkpoint-restore algorithm. We evaluate the performance of OneOS against an open-source IoT Platform, ThingsJS, using an IoT stream processing benchmark suite, and a video processing application. OneOS executes the programs about 3x faster than ThingsJS, reduces the code size by about 22%, and recovers the state of failed applications within 1 s upon detecting their failure.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1175-1192"},"PeriodicalIF":5.6,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信