arXiv - CS - Distributed, Parallel, and Cluster Computing最新文献

筛选
英文 中文
DiReDi: Distillation and Reverse Distillation for AIoT Applications DiReDi:面向 AIoT 应用的蒸馏和反向蒸馏
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-12 DOI: arxiv-2409.08308
Chen Sun, Qing Tong, Wenshuang Yang, Wenqi Zhang
{"title":"DiReDi: Distillation and Reverse Distillation for AIoT Applications","authors":"Chen Sun, Qing Tong, Wenshuang Yang, Wenqi Zhang","doi":"arxiv-2409.08308","DOIUrl":"https://doi.org/arxiv-2409.08308","url":null,"abstract":"Typically, the significant efficiency can be achieved by deploying different\u0000edge AI models in various real world scenarios while a few large models manage\u0000those edge AI models remotely from cloud servers. However, customizing edge AI\u0000models for each user's specific application or extending current models to new\u0000application scenarios remains a challenge. Inappropriate local training or fine\u0000tuning of edge AI models by users can lead to model malfunction, potentially\u0000resulting in legal issues for the manufacturer. To address aforementioned\u0000issues, this paper proposes an innovative framework called \"DiReD\", which\u0000involves knowledge DIstillation & REverse DIstillation. In the initial step, an\u0000edge AI model is trained with presumed data and a KD process using the cloud AI\u0000model in the upper management cloud server. This edge AI model is then\u0000dispatched to edge AI devices solely for inference in the user's application\u0000scenario. When the user needs to update the edge AI model to better fit the\u0000actual scenario, the reverse distillation (RD) process is employed to extract\u0000the knowledge: the difference between user preferences and the manufacturer's\u0000presumptions from the edge AI model using the user's exclusive data. Only the\u0000extracted knowledge is reported back to the upper management cloud server to\u0000update the cloud AI model, thus protecting user privacy by not using any\u0000exclusive data. The updated cloud AI can then update the edge AI model with the\u0000extended knowledge. Simulation results demonstrate that the proposed \"DiReDi\"\u0000framework allows the manufacturer to update the user model by learning new\u0000knowledge from the user's actual scenario with private data. The initial\u0000redundant knowledge is reduced since the retraining emphasizes user private\u0000data.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DFDG: Data-Free Dual-Generator Adversarial Distillation for One-Shot Federated Learning DFDG:用于单次联合学习的无数据双生成器逆向精馏法
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-12 DOI: arxiv-2409.07734
Kangyang Luo, Shuai Wang, Yexuan Fu, Renrong Shao, Xiang Li, Yunshi Lan, Ming Gao, Jinlong Shu
{"title":"DFDG: Data-Free Dual-Generator Adversarial Distillation for One-Shot Federated Learning","authors":"Kangyang Luo, Shuai Wang, Yexuan Fu, Renrong Shao, Xiang Li, Yunshi Lan, Ming Gao, Jinlong Shu","doi":"arxiv-2409.07734","DOIUrl":"https://doi.org/arxiv-2409.07734","url":null,"abstract":"Federated Learning (FL) is a distributed machine learning scheme in which\u0000clients jointly participate in the collaborative training of a global model by\u0000sharing model information rather than their private datasets. In light of\u0000concerns associated with communication and privacy, one-shot FL with a single\u0000communication round has emerged as a de facto promising solution. However,\u0000existing one-shot FL methods either require public datasets, focus on model\u0000homogeneous settings, or distill limited knowledge from local models, making it\u0000difficult or even impractical to train a robust global model. To address these\u0000limitations, we propose a new data-free dual-generator adversarial distillation\u0000method (namely DFDG) for one-shot FL, which can explore a broader local models'\u0000training space via training dual generators. DFDG is executed in an adversarial\u0000manner and comprises two parts: dual-generator training and dual-model\u0000distillation. In dual-generator training, we delve into each generator\u0000concerning fidelity, transferability and diversity to ensure its utility, and\u0000additionally tailor the cross-divergence loss to lessen the overlap of dual\u0000generators' output spaces. In dual-model distillation, the trained dual\u0000generators work together to provide the training data for updates of the global\u0000model. At last, our extensive experiments on various image classification tasks\u0000show that DFDG achieves significant performance gains in accuracy compared to\u0000SOTA baselines.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Study on Asynchronous Vote-based Blockchains 基于异步投票的区块链研究
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-12 DOI: arxiv-2409.08161
Yibin Xu, Jianhua Shao, Tijs Slaats, Boris Düdder, Yongluan Zhou
{"title":"A Study on Asynchronous Vote-based Blockchains","authors":"Yibin Xu, Jianhua Shao, Tijs Slaats, Boris Düdder, Yongluan Zhou","doi":"arxiv-2409.08161","DOIUrl":"https://doi.org/arxiv-2409.08161","url":null,"abstract":"Vote-based blockchains construct a state machine replication (SMR) system\u0000among participating nodes, using Byzantine Fault Tolerance (BFT) consensus\u0000protocols to transition from one state to another. Currently, they rely on\u0000either synchronous or partially synchronous networks with leader-based\u0000coordination or costly Asynchronous Common Subset (ACS) protocols in\u0000asynchronous settings, making them impractical for large-scale asynchronous\u0000applications. To make Asynchronous SMR scalable, this paper proposes a emph{validated\u0000strong} BFT consensus model that allows leader-based coordination in\u0000asynchronous settings. Our BFT consensus model offers the same level of\u0000tolerance as binary byzantine agreement but does not demand consistency among\u0000honest nodes before they vote. An SMR using our model allows nodes to operate\u0000in different, tentative, but mutually exclusive states until they eventually\u0000converge on the same state. We propose an asynchronous BFT protocol for\u0000vote-based blockchains employing our consensus model to address several\u0000critical challenges: how to ensure that nodes eventually converge on the same\u0000state across voting rounds, how to assure that a blockchain will steadily\u0000progress through epochs while reaching consensus for previous epochs, and how\u0000to maintain robust byzantine fault tolerance. Our protocol greatly reduces message complexity and is the first one to\u0000achieve linear view changes without relying on threshold signatures. We prove\u0000that an asynchronous blockchain built on our protocol can operate with the\u0000emph{same} simplicity and efficiency as partially synchronous blockchains\u0000built on, e.g. HotStuff-2. This facilitates deploying asynchronous blockchains\u0000across large-scale networks.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cooperative Inference with Interleaved Operator Partitioning for CNNs 利用交错算子分区为 CNN 进行合作推理
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-12 DOI: arxiv-2409.07693
Zhibang Liu, Chaonong Xu, Zhizhuo Liu, Lekai Huang, Jiachen Wei, Chao Li
{"title":"Cooperative Inference with Interleaved Operator Partitioning for CNNs","authors":"Zhibang Liu, Chaonong Xu, Zhizhuo Liu, Lekai Huang, Jiachen Wei, Chao Li","doi":"arxiv-2409.07693","DOIUrl":"https://doi.org/arxiv-2409.07693","url":null,"abstract":"Deploying deep learning models on Internet of Things (IoT) devices often\u0000faces challenges due to limited memory resources and computing capabilities.\u0000Cooperative inference is an important method for addressing this issue,\u0000requiring the partitioning and distributive deployment of an intelligent model.\u0000To perform horizontal partitions, existing cooperative inference methods take\u0000either the output channel of operators or the height and width of feature maps\u0000as the partition dimensions. In this manner, since the activation of operators\u0000is distributed, they have to be concatenated together before being fed to the\u0000next operator, which incurs the delay for cooperative inference. In this paper,\u0000we propose the Interleaved Operator Partitioning (IOP) strategy for CNN models.\u0000By partitioning an operator based on the output channel dimension and its\u0000successive operator based on the input channel dimension, activation\u0000concatenation becomes unnecessary, thereby reducing the number of communication\u0000connections, which consequently reduces cooperative inference de-lay. Based on\u0000IOP, we further present a model segmentation algorithm for minimizing\u0000cooperative inference time, which greedily selects operators for IOP pairing\u0000based on the inference delay benefit harvested. Experimental results\u0000demonstrate that compared with the state-of-the-art partition approaches used\u0000in CoEdge, the IOP strategy achieves 6.39% ~ 16.83% faster acceleration and\u0000reduces peak memory footprint by 21.22% ~ 49.98% for three classical image\u0000classification models.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Backup System with No Impact on Business Processing Utilizing Storage and Container Technologies 利用存储和容器技术,不影响业务处理的数据备份系统
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-11 DOI: arxiv-2409.07081
Satoru Watanabe
{"title":"Data Backup System with No Impact on Business Processing Utilizing Storage and Container Technologies","authors":"Satoru Watanabe","doi":"arxiv-2409.07081","DOIUrl":"https://doi.org/arxiv-2409.07081","url":null,"abstract":"Data backup is a core technology for improving system resilience to system\u0000failures. Data backup in enterprise systems is required to minimize the impacts\u0000on business processing, which can be categorized into two factors: system\u0000slowdown and downtime. To eliminate system slowdown, asynchronous data copy\u0000(ADC) technology is prevalent, which copies data asynchronously with original\u0000data updates. However, the ADC can collapse backup data when applied to\u0000enterprise systems with multiple resources. Then, the demonstration system\u0000employed consistency group technology, which makes the order of data updates\u0000the same between the original and backup data. In addition, we developed a\u0000container platform operator to unravel the complicated correspondence between\u0000storage volumes and applications. The operator automates the configuration of\u0000the ADC with the setting of consistency groups. We integrated the storage and\u0000container technologies into the demonstration system, which can eliminate both\u0000system slowdown and downtime.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FreeRide: Harvesting Bubbles in Pipeline Parallelism FreeRide:在管道并行中收获气泡
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-11 DOI: arxiv-2409.06941
Jiashu ZhangYiming, Zihan PanYiming, MollyYiming, Xu, Khuzaima Daudjee, Sihang Liu
{"title":"FreeRide: Harvesting Bubbles in Pipeline Parallelism","authors":"Jiashu ZhangYiming, Zihan PanYiming, MollyYiming, Xu, Khuzaima Daudjee, Sihang Liu","doi":"arxiv-2409.06941","DOIUrl":"https://doi.org/arxiv-2409.06941","url":null,"abstract":"The occurrence of bubbles in pipeline parallelism is an inherent limitation\u0000that can account for more than 40% of the large language model (LLM) training\u0000time and is one of the main reasons for the underutilization of GPU resources\u0000in LLM training. Harvesting these bubbles for GPU side tasks can increase\u0000resource utilization and reduce training costs but comes with challenges.\u0000First, because bubbles are discontinuous with various shapes, programming side\u0000tasks becomes difficult while requiring excessive engineering effort. Second, a\u0000side task can compete with pipeline training for GPU resources and incur\u0000significant overhead. To address these challenges, we propose FreeRide, a\u0000system designed to harvest bubbles in pipeline parallelism for side tasks.\u0000FreeRide provides programmers with interfaces to implement side tasks easily,\u0000manages bubbles and side tasks during pipeline training, and controls access to\u0000GPU resources by side tasks to reduce overhead. We demonstrate that FreeRide\u0000achieves 7.8% average cost savings with a negligible overhead of about 1% in\u0000training LLMs while serving model training, graph analytics, and image\u0000processing side tasks.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HERL: Tiered Federated Learning with Adaptive Homomorphic Encryption using Reinforcement Learning HERL:利用强化学习进行分层联合学习与自适应同态加密
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-11 DOI: arxiv-2409.07631
Jiaxang Tang, Zeshan Fayyaz, Mohammad A. Salahuddin, Raouf Boutaba, Zhi-Li Zhang, Ali Anwar
{"title":"HERL: Tiered Federated Learning with Adaptive Homomorphic Encryption using Reinforcement Learning","authors":"Jiaxang Tang, Zeshan Fayyaz, Mohammad A. Salahuddin, Raouf Boutaba, Zhi-Li Zhang, Ali Anwar","doi":"arxiv-2409.07631","DOIUrl":"https://doi.org/arxiv-2409.07631","url":null,"abstract":"Federated Learning is a well-researched approach for collaboratively training\u0000machine learning models across decentralized data while preserving privacy.\u0000However, integrating Homomorphic Encryption to ensure data confidentiality\u0000introduces significant computational and communication overheads, particularly\u0000in heterogeneous environments where clients have varying computational\u0000capacities and security needs. In this paper, we propose HERL, a Reinforcement\u0000Learning-based approach that uses Q-Learning to dynamically optimize encryption\u0000parameters, specifically the polynomial modulus degree, $N$, and the\u0000coefficient modulus, $q$, across different client tiers. Our proposed method\u0000involves first profiling and tiering clients according to the chosen clustering\u0000approach, followed by dynamically selecting the most suitable encryption\u0000parameters using an RL-agent. Experimental results demonstrate that our\u0000approach significantly reduces the computational overhead while maintaining\u0000utility and a high level of security. Empirical results show that HERL improves\u0000utility by 17%, reduces the convergence time by up to 24%, and increases\u0000convergence efficiency by up to 30%, with minimal security loss.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee 利用 OpenMP 卸载和 Codee 优化天气研究和预测模型
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-11 DOI: arxiv-2409.07232
ChayanonNamo, WichitrnithedHelen, Woo-Sun-YangHelen, YunHelen, He, Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I. Gustafson Jr., Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste
{"title":"Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee","authors":"ChayanonNamo, WichitrnithedHelen, Woo-Sun-YangHelen, YunHelen, He, Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I. Gustafson Jr., Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste","doi":"arxiv-2409.07232","DOIUrl":"https://doi.org/arxiv-2409.07232","url":null,"abstract":"Currently, the Weather Research and Forecasting model (WRF) utilizes shared\u0000memory (OpenMP) and distributed memory (MPI) parallelisms. To take advantage of\u0000GPU resources on the Perlmutter supercomputer at NERSC, we port parts of the\u0000computationally expensive routines of the Fast Spectral Bin Microphysics (FSBM)\u0000microphysical scheme to NVIDIA GPUs using OpenMP device offloading directives.\u0000To facilitate this process, we explore a workflow for optimization which uses\u0000both runtime profilers and a static code inspection tool Codee to refactor the\u0000subroutine. We observe a 2.08x overall speedup for the CONUS-12km thunderstorm\u0000test case.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed Convolutional Neural Network Training on Mobile and Edge Clusters 移动和边缘集群上的分布式卷积神经网络训练
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-11 DOI: arxiv-2409.09083
Pranav Rama, Madison Threadgill, Andreas Gerstlauer
{"title":"Distributed Convolutional Neural Network Training on Mobile and Edge Clusters","authors":"Pranav Rama, Madison Threadgill, Andreas Gerstlauer","doi":"arxiv-2409.09083","DOIUrl":"https://doi.org/arxiv-2409.09083","url":null,"abstract":"The training of deep and/or convolutional neural networks (DNNs/CNNs) is\u0000traditionally done on servers with powerful CPUs and GPUs. Recent efforts have\u0000emerged to localize machine learning tasks fully on the edge. This brings\u0000advantages in reduced latency and increased privacy, but necessitates working\u0000with resource-constrained devices. Approaches for inference and training in\u0000mobile and edge devices based on pruning, quantization or incremental and\u0000transfer learning require trading off accuracy. Several works have explored\u0000distributing inference operations on mobile and edge clusters instead. However,\u0000there is limited literature on distributed training on the edge. Existing\u0000approaches all require a central, potentially powerful edge or cloud server for\u0000coordination or offloading. In this paper, we describe an approach for\u0000distributed CNN training exclusively on mobile and edge devices. Our approach\u0000is beneficial for the initial CNN layers that are feature map dominated. It is\u0000based on partitioning forward inference and back-propagation operations among\u0000devices through tiling and fusing to maximize locality and expose communication\u0000and memory-aware parallelism. We also introduce the concept of layer grouping\u0000to further fine-tune performance based on computation and communication\u0000trade-off. Results show that for a cluster of 2-6 quad-core Raspberry Pi3\u0000devices, training of an object-detection CNN provides a 2x-15x speedup with\u0000respect to a single core and up to 8x reduction in memory usage per device, all\u0000without sacrificing accuracy. Grouping offers up to 1.5x speedup depending on\u0000the reference profile and batch size.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural Networks D3-GNN:流图神经网络的动态分布式数据流
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-10 DOI: arxiv-2409.09079
Rustam Guliyev, Aparajita Haldar, Hakan Ferhatosmanoglu
{"title":"D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural Networks","authors":"Rustam Guliyev, Aparajita Haldar, Hakan Ferhatosmanoglu","doi":"arxiv-2409.09079","DOIUrl":"https://doi.org/arxiv-2409.09079","url":null,"abstract":"Graph Neural Network (GNN) models on streaming graphs entail algorithmic\u0000challenges to continuously capture its dynamic state, as well as systems\u0000challenges to optimize latency, memory, and throughput during both inference\u0000and training. We present D3-GNN, the first distributed, hybrid-parallel,\u0000streaming GNN system designed to handle real-time graph updates under online\u0000query setting. Our system addresses data management, algorithmic, and systems\u0000challenges, enabling continuous capturing of the dynamic state of the graph and\u0000updating node representations with fault-tolerance and optimal latency,\u0000load-balance, and throughput. D3-GNN utilizes streaming GNN aggregators and an\u0000unrolled, distributed computation graph architecture to handle cascading graph\u0000updates. To counteract data skew and neighborhood explosion issues, we\u0000introduce inter-layer and intra-layer windowed forward pass solutions.\u0000Experiments on large-scale graph streams demonstrate that D3-GNN achieves high\u0000efficiency and scalability. Compared to DGL, D3-GNN achieves a significant\u0000throughput improvement of about 76x for streaming workloads. The windowed\u0000enhancement further reduces running times by around 10x and message volumes by\u0000up to 15x at higher parallelism.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信