IEEE Transactions on Parallel and Distributed Systems最新文献

筛选
英文 中文
Fair Coflow Scheduling via Controlled Slowdown 通过受控减速实现公平的共流调度
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-08-20 DOI: 10.1109/TPDS.2024.3446188
Francesco De Pellegrini;Vaibhav Kumar Gupta;Rachid El Azouzi;Serigne Gueye;Cedric Richier;Jeremie Leguay
{"title":"Fair Coflow Scheduling via Controlled Slowdown","authors":"Francesco De Pellegrini;Vaibhav Kumar Gupta;Rachid El Azouzi;Serigne Gueye;Cedric Richier;Jeremie Leguay","doi":"10.1109/TPDS.2024.3446188","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3446188","url":null,"abstract":"The average coflow completion time (CCT) is the standard performance metric in coflow scheduling. However, standard CCT minimization may introduce unfairness between the data transfer phase of different computing jobs. Thus, while progress guarantees have been introduced in the literature to mitigate this fairness issue, the trade-off between fairness and efficiency of data transfer is hard to control. This paper introduces a fairness framework for coflow scheduling based on the concept of slowdown, i.e., the performance loss of a coflow compared to isolation. By controlling the slowdown it is possible to enforce a target coflow progress while minimizing the average CCT. In the proposed framework, the minimum slowdown for a batch of coflows can be determined in polynomial time. By showing the equivalence with Gaussian elimination, slowdown constraints are introduced into primal-dual iterations of the CoFair algorithm. The algorithm extends the class of the \u0000<inline-formula><tex-math>$sigma$</tex-math></inline-formula>\u0000-order schedulers to solve the fair coflow scheduling problem in polynomial time. It provides a 4-approximation of the average CCT w.r.t. an optimal scheduler. Extensive numerical results demonstrate that this approach can trade off average CCT for slowdown more efficiently than existing state of the art schedulers.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2347-2360"},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-Preserving Data Selection for Horizontal and Vertical Federated Learning 为横向和纵向联合学习选择保护隐私的数据
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-08-19 DOI: 10.1109/TPDS.2024.3439709
Lan Zhang;Anran Li;Hongyi Peng;Feng Han;Fan Huang;Xiang-Yang Li
{"title":"Privacy-Preserving Data Selection for Horizontal and Vertical Federated Learning","authors":"Lan Zhang;Anran Li;Hongyi Peng;Feng Han;Fan Huang;Xiang-Yang Li","doi":"10.1109/TPDS.2024.3439709","DOIUrl":"10.1109/TPDS.2024.3439709","url":null,"abstract":"Federated learning (FL) enables distributed participants to collaboratively train a machine learning model without accessing to their local data. In FL systems, the selection of training samples has a significant impact on model performances, e.g., selecting participants whose datasets have low-quality samples, features would result in low accuracy, unstable models. In this work, we aim to solve the problem that selects a collection of high-quality training samples for a given FL task under a monetary budget. We propose a holistic design to efficiently select high-quality samples while preserve the privacy of participants’ local data, the server’s label set. We propose an efficient hierarchical sample selection mechanism to select relevant clients, their samples before training for horizontal federated learning (HFL). It uses the determinantal point process (DPP) to select both the statistical homogenous, content diverse clients, samples. Besides, we propose a private set intersection (PSI) based scheme to filter relevant features for the target VFL task. Finally, during training, an erroneous-aware importance based selection is proposed to dynamically select important clients, samples to accelerate model convergence. We verify the merits of our proposed solution with extensive experiments on a real AIoT system with 50 clients. The experimental results validate that our solution achieves accurate, efficient selection of high-quality data, consequently an FL model with a faster convergence speed, higher accuracy.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 11","pages":"2054-2068"},"PeriodicalIF":5.6,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Logical Synchrony and the Bittide Mechanism 逻辑同步和比特机制
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-08-16 DOI: 10.1109/TPDS.2024.3444739
Sanjay Lall;Călin Caşcaval;Martin Izzard;Tammo Spalink
{"title":"Logical Synchrony and the Bittide Mechanism","authors":"Sanjay Lall;Călin Caşcaval;Martin Izzard;Tammo Spalink","doi":"10.1109/TPDS.2024.3444739","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3444739","url":null,"abstract":"We introduce logical synchrony, a framework that allows distributed computing to be coordinated as tightly as in synchronous systems without the distribution of a global clock or any reference to universal time. We develop a model of events called a logical synchrony network, in which nodes correspond to processors and every node has an associated local clock which generates the events. We construct a measure of logical latency and develop its properties. A further model, called a multiclock network, is then analyzed and shown to be a refinement of the logical synchrony network. We present the bittide mechanism as an instantiation of multiclock networks, and discuss the clock control mechanism that ensures that buffers do not overflow or underflow. Finally we give conditions under which a logical synchrony network has an equivalent synchronous realization.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 11","pages":"1936-1948"},"PeriodicalIF":5.6,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10638228","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142159918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Paired Many-to-Many 2-Disjoint Path Covers in Meshes 网格中成对的多对多 2-Disjoint 路径覆盖
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-08-16 DOI: 10.1109/TPDS.2024.3445283
Fatemeh Keshavarz-Kohjerdi
{"title":"Paired Many-to-Many 2-Disjoint Path Covers in Meshes","authors":"Fatemeh Keshavarz-Kohjerdi","doi":"10.1109/TPDS.2024.3445283","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3445283","url":null,"abstract":"In the paired many-to-many \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-disjoint path cover (\u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-DPC) problem, given a set of \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000 pairs of vertices \u0000<inline-formula><tex-math>$(s_{i},t_{i})$</tex-math></inline-formula>\u0000, \u0000<inline-formula><tex-math>$1leqslant ileqslant k$</tex-math></inline-formula>\u0000, of a graph \u0000<inline-formula><tex-math>$G$</tex-math></inline-formula>\u0000 we want to find \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000 simple vertex-disjoint paths whose end-vertices are these \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000 pairs, such that each vertex of \u0000<inline-formula><tex-math>$G$</tex-math></inline-formula>\u0000 is covered by a path. This problem is a well-known problem in parallel processing and is a generalization of the well-known Hamiltonian \u0000<inline-formula><tex-math>$(s,t)$</tex-math></inline-formula>\u0000-path problem, which is equal to 1-DPC. In this paper, we consider the paired many-to-many 2-disjoint path cover problem (2-DPC) in meshes (rectangular grids). We give the necessary conditions for existence of such covers, and present a linear-time algorithm to compute them. Although the paired many-to-many \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-disjoint path cover problem is well-known in parallel processing, our motivation to study this problem is its application in solving the Hamiltonian path problem in solid grid graphs. We consider the case where the pairs of vertices are on the outer face of the graph.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1854-1866"},"PeriodicalIF":5.6,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142090712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlexRaft: Exploiting Flexible Erasure Coding for Minimum-Cost Consensus and Fast Recovery FlexRaft:利用灵活的擦除编码实现最低成本共识和快速恢复
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-08-14 DOI: 10.1109/TPDS.2024.3443424
Mi Zhang;Qihan Kang;Patrick P. C. Lee
{"title":"FlexRaft: Exploiting Flexible Erasure Coding for Minimum-Cost Consensus and Fast Recovery","authors":"Mi Zhang;Qihan Kang;Patrick P. C. Lee","doi":"10.1109/TPDS.2024.3443424","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3443424","url":null,"abstract":"Consensus protocols like Paxos and Raft provide data consistency and fault tolerance for distributed services. Log replication in these protocols can be supported by erasure coding, which incurs lower redundancy than full-copy replication and significantly saves network and storage costs for overall performance improvements. However, existing consensus protocols with erasure coding cannot achieve the minimum network and storage costs during log replication. We propose FlexRaft, which dynamically varies the coding scheme used in Raft based on the server status to always achieve the theoretically minimum redundancy ratio, while maintaining the same liveness as in Raft. To address the issue of an inconsistent coding scheme between the leader and its followers, we specify the prerequisite of overwriting a log entry and also allow the leader and its followers to exactly track the coding scheme being used. We further extend FlexRaft into FlexRaft+, which provides a different storage layout to vary the coding scheme through a novel technique called re-encoding-free replication, so as to enable fast server recovery. We prove that both FlexRaft and FlexRaft+ maintain Raft safety. We implement a prototype of FlexRaft and FlexRaft+, atop which we build a distributed key-value store to show its efficacy. Experiments on Alibaba Cloud show that FlexRaft achieves the theoretically minimum network and storage costs in practice, and reduces the commit latency by 44.51% and 19.37% compared with state-of-the-art CRaft and HRaft, respectively. FlexRaft+ further reduces the commit latency when the coding scheme is being varied and improves the server recovery performance.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1826-1840"},"PeriodicalIF":5.6,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142090784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSRAID: A Stripe-Queued and Stripe-Threaded Merging I/O Strategy to Improve Write Performance of Serial Interface SSD RAID SSRAID:提高串行接口固态盘 RAID 写入性能的条带-队列和条带-线程合并 I/O 策略
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-08-14 DOI: 10.1109/TPDS.2024.3443083
Peixuan Li;Ping Xie;Qiang Cao
{"title":"SSRAID: A Stripe-Queued and Stripe-Threaded Merging I/O Strategy to Improve Write Performance of Serial Interface SSD RAID","authors":"Peixuan Li;Ping Xie;Qiang Cao","doi":"10.1109/TPDS.2024.3443083","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3443083","url":null,"abstract":"RAID (Redundant Array of Independent Disks) has been widely used to enhance read and write performance of existing storage systems. Existing software RAID do not fully utilize write performance of Serial interface SSDs (Solid State Drive). The most popular software RAID currently is Linux Multiple-Disks (MD), and the latest software RAID is StRAID. We observe that both of these software RAID methods lead to thread contention in multi-threaded mode, especially when applied to Serial interface SSDs. Multiple threads writing to same address can limit write performance. In this paper, we propose a stripe-queued and stripe-threaded merging I/O strategy. First, SSRAID segregates write requests across different stripes using a set of stripe-queues and stripe-threads to prevent interference between them. As a result, write thread contention in SSRAID is eliminated, allowing stripe-threads to maintain the highest efficiency of parallelism. Secondly, SSRAID can merge write requests from the same stripe-queue multiple times through stripe-thread, effectively reducing the number of additional write I/Os. Finally, SSRAID presents a stage buffer based on data merging. During partial stripe-write, write-induced read I/Os on the SSD are transformed into direct access to the stage buffer, effectively reducing write-induced read I/Os. Compared to StRAID, SSRAID improves average sequential write throughput by 86% and reduces average sequential write latency by 61% in the optimal case.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1841-1853"},"PeriodicalIF":5.6,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142090952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proteus: Simulating the Performance of Distributed DNN Training Proteus:模拟分布式 DNN 训练的性能
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-08-14 DOI: 10.1109/TPDS.2024.3443255
Jiangfei Duan;Xiuhong Li;Ping Xu;Xingcheng Zhang;Shengen Yan;Yun Liang;Dahua Lin
{"title":"Proteus: Simulating the Performance of Distributed DNN Training","authors":"Jiangfei Duan;Xiuhong Li;Ping Xu;Xingcheng Zhang;Shengen Yan;Yun Liang;Dahua Lin","doi":"10.1109/TPDS.2024.3443255","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3443255","url":null,"abstract":"DNN models are becoming increasingly larger to achieve unprecedented accuracy, and the accompanying increased computation and memory requirements necessitate the employment of massive clusters and elaborate parallelization strategies to accelerate DNN training. In order to better optimize the performance and analyze the cost, it is indispensable to model the training throughput of distributed DNN training. However, complex parallelization strategies and the resulting complex runtime behaviors make it challenging to construct an accurate performance model. In this article, we present Proteus, the first standalone simulator to model the performance of complex parallelization strategies through simulation execution. Proteus first models complex parallelization strategies with a unified representation named \u0000<italic>Strategy Tree</i>\u0000. Then, it compiles the strategy tree into a distributed execution graph and simulates the complex runtime behaviors, \u0000<italic>comp-comm overlap</i>\u0000 and \u0000<italic>bandwidth sharing</i>\u0000, with a \u0000<underline>H</u>\u0000ierarchical \u0000<underline>T</u>\u0000opo-\u0000<underline>A</u>\u0000ware \u0000<underline>E</u>\u0000xecutor (\u0000<italic>HTAE</i>\u0000). We finally evaluate Proteus across a wide variety of DNNs on three hardware configurations. Experimental results show that Proteus achieves 3.0% average prediction error and preserves order for training throughput of various parallelization strategies. Compared to state-of-the-art approaches, Proteus reduces prediction error by up to 133.8%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1867-1878"},"PeriodicalIF":5.6,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10636756","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142090713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Opca: Enabling Optimistic Concurrent Access for Multiple Users in Oblivious Data Storage Opca:在遗忘数据存储中实现多用户优化并发访问
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-08-12 DOI: 10.1109/TPDS.2024.3441623
Yuezhi Che;Dazhao Cheng;Xiao Wang;Rujia Wang
{"title":"Opca: Enabling Optimistic Concurrent Access for Multiple Users in Oblivious Data Storage","authors":"Yuezhi Che;Dazhao Cheng;Xiao Wang;Rujia Wang","doi":"10.1109/TPDS.2024.3441623","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3441623","url":null,"abstract":"The challenges of data privacy and security posed by data outsourcing are becoming increasingly prevalent. Oblivious RAM (ORAM)-based oblivious data storage guarantees data confidentiality through data encryption and access pattern obfuscation. However, it suffers from performance degradation and low throughput. To address these issues, the concurrency of ORAM in a multi-user scenario has been explored. We investigate several existing concurrent oblivious data storage solutions and discover that a trusted proxy is used to serve concurrent accesses between users and storage, with processing locks involved in the proxy to ensure correctness and prevent conflicts. The proxy-based system is inherently prone to pessimistic concurrency control, and as the number of users grows, a proxy might become a performance bottleneck, causing significant delays. In this study, we propose Opca, a novel oblivious data storage framework that enables optimistic concurrent access. Opca refines the proxy design by temporally storing multiple versions of modified data with labeled timestamps, committing only the latest version to the storage during a separate processing period. Opca is implemented and evaluated in different real-world storage backends with a scalable number of users, and its performance is compared to alternative schemes. Opca outperforms the state-of-the-art concurrent oblivious storage system TaoStore, which relies on a similar system setting. Our results show that Opca can improve 3.77x throughput and reduce 73.5% response time.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 11","pages":"1891-1903"},"PeriodicalIF":5.6,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142165005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNs 哪种耦合是最佳耦合?AIMC 瓦片接口和 CNN 负载平衡探索
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-08-02 DOI: 10.1109/TPDS.2024.3437657
Joshua Klein;Irem Boybat;Giovanni Ansaloni;Marina Zapater;David Atienza
{"title":"Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNs","authors":"Joshua Klein;Irem Boybat;Giovanni Ansaloni;Marina Zapater;David Atienza","doi":"10.1109/TPDS.2024.3437657","DOIUrl":"10.1109/TPDS.2024.3437657","url":null,"abstract":"Due to stringent energy and performance constraints, edge AI computing often employs heterogeneous systems that utilize both general-purpose CPUs and accelerators. Analog in-memory computing (AIMC) is a well-known AI inference solution that overcomes computational bottlenecks by performing matrix-vector multiplication operations (MVMs) in constant time. However, the tiles of AIMC-based accelerators are limited by the number of weights they can hold. State-of-the-art research often sizes neural networks to AIMC tiles (or vice-versa), but does not consider cases where AIMC tiles cannot cover the whole network due to lack of tile resources or the network size. In this work, we study the trade-offs of available AIMC tile resources, neural network coverage, AIMC tile proximity to compute resources, and multi-core load balancing techniques. We first perform a study of single-layer performance and energy scalability of AIMC tiles in the two most typical AIMC acceleration targets: dense/fully-connected layers and convolutional layers. This study guides the methodology with which we approach parameter allocation to AIMC tiles in the context of large edge neural networks, both where AIMC tiles are close to the CPU (tightly-coupled) and cannot share resources across the system, and where AIMC tiles are far from the CPU (loosely-coupled) and can employ workload stealing. We explore the performance and energy trends of six modern CNNs using different methods of load balancing for differently-coupled system configurations with variable AIMC tile resources. We show that, by properly distributing workloads, AIMC acceleration can be made highly effective even on under-provisioned systems. As an example, 5.9x speedup and 5.6x energy gains were measured on an 8-core system, for a 41% coverage of neural network parameters.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1780-1795"},"PeriodicalIF":5.6,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Locality-Preserving Graph Traversal With Split Live Migration 利用分割实时迁移实现位置保护图遍历
IF 5.6 2区 计算机科学
IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-08-02 DOI: 10.1109/TPDS.2024.3436828
Rong Chen;Xingda Wei;Xiating Xie;Haibo Chen
{"title":"Locality-Preserving Graph Traversal With Split Live Migration","authors":"Rong Chen;Xingda Wei;Xiating Xie;Haibo Chen","doi":"10.1109/TPDS.2024.3436828","DOIUrl":"10.1109/TPDS.2024.3436828","url":null,"abstract":"Graph models many real-world data like social, transportation, biology, and communication data. Hence, graph traversal including multi-hop or graph-walking queries has been the key operation atop graph stores. However, since different graph traversals may touch different sets of vertices, it is hard or even impossible to have a one-size-fits-all graph partitioning algorithm that preserves access locality for various graph traversal workloads. Meanwhile, prior shard-based migration faces a dilemma such that coarse-grained migration may incur more migration overhead over increased locality benefits, while fine-grained migration usually requires excessive metadata and incurs non-trivial maintenance costs. We present Pragh, an efficient locality-preserving live graph migration scheme for graph stores in the form of key-value pairs. The key idea of Pragh is a split migration model that only migrates values physically while retaining keys in the initial location. This allows fine-grained migration while avoiding the need to maintain excessive metadata. Pragh integrates an RDMA-friendly location cache from DrTM-KV to provide fully-localized access to migrated data and further makes a novel reuse of the cache replacement policy for lightweight monitoring. Pragh further supports evolving graphs through a check-and-forward mechanism to resolve the conflict between updates and migration of graph data. Evaluations on an 8-node RDMA-capable cluster (100 Gbps) using a representative graph traversal benchmark show that Pragh can increase the throughput by up to 19× and decrease the median latency by up to 94%, thanks to split live migration that eliminates 97% remote accesses. A port of split live migration to Wukong shows up to 2.53× throughput improvement on representative workloads like LUBM-10240, thanks to a reduction of 88% remote accesses. This further confirms the effectiveness and generality of Pragh. Finally, though Pragh focuses on RDMA-based graph traversal, we show its generality by extending it to support graph traversals under traditional networking. Evaluations on the graph traversal benchmarks and graph query workloads on the same cluster but with 10 Gbps TCP/IP network further confirm its effectiveness without RDMA. Specifically, when evaluating on the LUBM-10240, Wukong-TCP with Pragh can achieve up to 1.87× throughput improvement with a 56% decrease in remote accesses.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1810-1825"},"PeriodicalIF":5.6,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信