SGKD:用于图表示学习的可扩展和有效的知识蒸馏框架

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI:10.1109/ICDMW58026.2022.00091

Yufei He, Yao Ma

{"title":"SGKD:用于图表示学习的可扩展和有效的知识蒸馏框架","authors":"Yufei He, Yao Ma","doi":"10.1109/ICDMW58026.2022.00091","DOIUrl":null,"url":null,"abstract":"As Graph Neural Networks (GNNs) are widely used in various fields, there is a growing demand for improving their efficiency and scalablity. Knowledge Distillation (KD), a classical methods for model compression and acceleration, has been gradually introduced into the field of graph learning. More recently, it has been shown that, through knowledge distillation, the predictive capability of a well-trained GNN model can be transferred to lightweight and easy-to-deploy MLP models. Such distilled MLPs are able to achieve comparable performance as their corresponding G NN teachers while being significantly more efficient in terms of both space and time. However, the research of KD for graph learning is still in its early stage and there exist several limitations in the existing KD framework. The major issues lie in distilled MLPs lack useful information about the graph structure and logits of teacher are not always reliable. In this paper, we propose a Scalable and effective graph neural network Knowledge Distillation framework (SGKD) to address these issues. Specifically, to include the graph, we use feature propagation as preprocessing to provide MLPs with graph structure-aware features in the original feature space; to address unreliable logits of teacher, we introduce simple yet effective training strategies such as masking and temperature. With these innovations, our framework is able to be more effective while remaining scalable and efficient in training and inference. We conducted comprehensive experiments on eight datasets of different sizes - up to 100 million nodes - under various settings. The results demonstrated that SG KD is able to significantly outperform existing KD methods and even achieve comparable performance with their state-of-the-art GNN teachers.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"SGKD: A Scalable and Effective Knowledge Distillation Framework for Graph Representation Learning\",\"authors\":\"Yufei He, Yao Ma\",\"doi\":\"10.1109/ICDMW58026.2022.00091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As Graph Neural Networks (GNNs) are widely used in various fields, there is a growing demand for improving their efficiency and scalablity. Knowledge Distillation (KD), a classical methods for model compression and acceleration, has been gradually introduced into the field of graph learning. More recently, it has been shown that, through knowledge distillation, the predictive capability of a well-trained GNN model can be transferred to lightweight and easy-to-deploy MLP models. Such distilled MLPs are able to achieve comparable performance as their corresponding G NN teachers while being significantly more efficient in terms of both space and time. However, the research of KD for graph learning is still in its early stage and there exist several limitations in the existing KD framework. The major issues lie in distilled MLPs lack useful information about the graph structure and logits of teacher are not always reliable. In this paper, we propose a Scalable and effective graph neural network Knowledge Distillation framework (SGKD) to address these issues. Specifically, to include the graph, we use feature propagation as preprocessing to provide MLPs with graph structure-aware features in the original feature space; to address unreliable logits of teacher, we introduce simple yet effective training strategies such as masking and temperature. With these innovations, our framework is able to be more effective while remaining scalable and efficient in training and inference. We conducted comprehensive experiments on eight datasets of different sizes - up to 100 million nodes - under various settings. The results demonstrated that SG KD is able to significantly outperform existing KD methods and even achieve comparable performance with their state-of-the-art GNN teachers.\",\"PeriodicalId\":146687,\"journal\":{\"name\":\"2022 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW58026.2022.00091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW58026.2022.00091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

随着图神经网络在各个领域的广泛应用，人们对其效率和可扩展性的要求越来越高。知识蒸馏(Knowledge Distillation, KD)是一种经典的模型压缩和加速方法，已逐渐被引入图学习领域。最近，研究表明，通过知识蒸馏，训练有素的GNN模型的预测能力可以转移到轻量级且易于部署的MLP模型中。这种经过提炼的mlp能够达到与相应的gnn教师相当的性能，同时在空间和时间方面都显着提高效率。然而，KD用于图学习的研究还处于起步阶段，现有的KD框架存在一些局限性。主要问题在于提取的mlp缺乏关于图结构的有用信息，并且教师的逻辑并不总是可靠的。在本文中，我们提出了一个可扩展和有效的图神经网络知识蒸馏框架(SGKD)来解决这些问题。具体来说，为了包含图，我们使用特征传播作为预处理，在原始特征空间中为mlp提供图结构感知特征;为了解决教师逻辑不可靠的问题，我们引入了简单而有效的训练策略，如遮蔽和温度。通过这些创新，我们的框架能够更有效，同时在训练和推理中保持可扩展性和效率。我们在8个不同规模的数据集上进行了全面的实验，在不同的设置下，多达1亿个节点。结果表明，SG KD能够显著优于现有的KD方法，甚至达到与他们最先进的GNN教师相当的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SGKD: A Scalable and Effective Knowledge Distillation Framework for Graph Representation Learning

As Graph Neural Networks (GNNs) are widely used in various fields, there is a growing demand for improving their efficiency and scalablity. Knowledge Distillation (KD), a classical methods for model compression and acceleration, has been gradually introduced into the field of graph learning. More recently, it has been shown that, through knowledge distillation, the predictive capability of a well-trained GNN model can be transferred to lightweight and easy-to-deploy MLP models. Such distilled MLPs are able to achieve comparable performance as their corresponding G NN teachers while being significantly more efficient in terms of both space and time. However, the research of KD for graph learning is still in its early stage and there exist several limitations in the existing KD framework. The major issues lie in distilled MLPs lack useful information about the graph structure and logits of teacher are not always reliable. In this paper, we propose a Scalable and effective graph neural network Knowledge Distillation framework (SGKD) to address these issues. Specifically, to include the graph, we use feature propagation as preprocessing to provide MLPs with graph structure-aware features in the original feature space; to address unreliable logits of teacher, we introduce simple yet effective training strategies such as masking and temperature. With these innovations, our framework is able to be more effective while remaining scalable and efficient in training and inference. We conducted comprehensive experiments on eight datasets of different sizes - up to 100 million nodes - under various settings. The results demonstrated that SG KD is able to significantly outperform existing KD methods and even achieve comparable performance with their state-of-the-art GNN teachers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

自引率

0.00%

发文量