Weijian Chen, Shuibing He, Haoyang Qu, Xuechen Zhang, Dan Feng
{"title":"HopGNN:通过以特征为中心的模型迁移提升分布式 GNN 训练效率","authors":"Weijian Chen, Shuibing He, Haoyang Qu, Xuechen Zhang, Dan Feng","doi":"arxiv-2409.00657","DOIUrl":null,"url":null,"abstract":"Distributed training of graph neural networks (GNNs) has become a crucial\ntechnique for processing large graphs. Prevalent GNN frameworks are\nmodel-centric, necessitating the transfer of massive graph vertex features to\nGNN models, which leads to a significant communication bottleneck. Recognizing\nthat the model size is often significantly smaller than the feature size, we\npropose LeapGNN, a feature-centric framework that reverses this paradigm by\nbringing GNN models to vertex features. To make it truly effective, we first\npropose a micrograph-based training strategy that trains the model using a\nrefined structure with superior locality to reduce remote feature retrieval.\nThen, we devise a feature pre-gathering approach that merges multiple fetch\noperations into a single one to eliminate redundant feature transmissions.\nFinally, we employ a micrograph-based merging method that adjusts the number of\nmicrographs for each worker to minimize kernel switches and synchronization\noverhead. Our experimental results demonstrate that LeapGNN achieves a\nperformance speedup of up to 4.2x compared to the state-of-the-art method,\nnamely P3.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"80 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration\",\"authors\":\"Weijian Chen, Shuibing He, Haoyang Qu, Xuechen Zhang, Dan Feng\",\"doi\":\"arxiv-2409.00657\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed training of graph neural networks (GNNs) has become a crucial\\ntechnique for processing large graphs. Prevalent GNN frameworks are\\nmodel-centric, necessitating the transfer of massive graph vertex features to\\nGNN models, which leads to a significant communication bottleneck. Recognizing\\nthat the model size is often significantly smaller than the feature size, we\\npropose LeapGNN, a feature-centric framework that reverses this paradigm by\\nbringing GNN models to vertex features. To make it truly effective, we first\\npropose a micrograph-based training strategy that trains the model using a\\nrefined structure with superior locality to reduce remote feature retrieval.\\nThen, we devise a feature pre-gathering approach that merges multiple fetch\\noperations into a single one to eliminate redundant feature transmissions.\\nFinally, we employ a micrograph-based merging method that adjusts the number of\\nmicrographs for each worker to minimize kernel switches and synchronization\\noverhead. Our experimental results demonstrate that LeapGNN achieves a\\nperformance speedup of up to 4.2x compared to the state-of-the-art method,\\nnamely P3.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"80 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.00657\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00657","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration
Distributed training of graph neural networks (GNNs) has become a crucial
technique for processing large graphs. Prevalent GNN frameworks are
model-centric, necessitating the transfer of massive graph vertex features to
GNN models, which leads to a significant communication bottleneck. Recognizing
that the model size is often significantly smaller than the feature size, we
propose LeapGNN, a feature-centric framework that reverses this paradigm by
bringing GNN models to vertex features. To make it truly effective, we first
propose a micrograph-based training strategy that trains the model using a
refined structure with superior locality to reduce remote feature retrieval.
Then, we devise a feature pre-gathering approach that merges multiple fetch
operations into a single one to eliminate redundant feature transmissions.
Finally, we employ a micrograph-based merging method that adjusts the number of
micrographs for each worker to minimize kernel switches and synchronization
overhead. Our experimental results demonstrate that LeapGNN achieves a
performance speedup of up to 4.2x compared to the state-of-the-art method,
namely P3.