TinyKG: Memory-Efficient Training Framework for Knowledge Graph Neural Recommender Systems

Proceedings of the 16th ACM Conference on Recommender Systems Pub Date : 2022-09-18 DOI:10.1145/3523227.3546760

Huiyuan Chen, Xiaoting Li, Kaixiong Zhou, Xia Hu, Chin-Chia Michael Yeh, Yan Zheng, Hao Yang

{"title":"TinyKG: Memory-Efficient Training Framework for Knowledge Graph Neural Recommender Systems","authors":"Huiyuan Chen, Xiaoting Li, Kaixiong Zhou, Xia Hu, Chin-Chia Michael Yeh, Yan Zheng, Hao Yang","doi":"10.1145/3523227.3546760","DOIUrl":null,"url":null,"abstract":"There has been an explosion of interest in designing various Knowledge Graph Neural Networks (KGNNs), which achieve state-of-the-art performance and provide great explainability for recommendation. The promising performance is mainly resulting from their capability of capturing high-order proximity messages over the knowledge graphs. However, training KGNNs at scale is challenging due to the high memory usage. In the forward pass, the automatic differentiation engines (e.g., TensorFlow/PyTorch) generally need to cache all intermediate activation maps in order to compute gradients in the backward pass, which leads to a large GPU memory footprint. Existing work solves this problem by utilizing multi-GPU distributed frameworks. Nonetheless, this poses a practical challenge when seeking to deploy KGNNs in memory-constrained environments, especially for industry-scale graphs. Here we present TinyKG, a memory-efficient GPU-based training framework for KGNNs for the tasks of recommendation. Specifically, TinyKG uses exact activations in the forward pass while storing a quantized version of activations in the GPU buffers. During the backward pass, these low-precision activations are dequantized back to full-precision tensors, in order to compute gradients. To reduce the quantization errors, TinyKG applies a simple yet effective quantization algorithm to compress the activations, which ensures unbiasedness with low variance. As such, the training memory footprint of KGNNs is largely reduced with negligible accuracy loss. To evaluate the performance of our TinyKG, we conduct comprehensive experiments on real-world datasets. We found that our TinyKG with INT2 quantization aggressively reduces the memory footprint of activation maps with 7 ×, only with 2% loss in accuracy, allowing us to deploy KGNNs on memory-constrained devices.","PeriodicalId":443279,"journal":{"name":"Proceedings of the 16th ACM Conference on Recommender Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM Conference on Recommender Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3523227.3546760","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

There has been an explosion of interest in designing various Knowledge Graph Neural Networks (KGNNs), which achieve state-of-the-art performance and provide great explainability for recommendation. The promising performance is mainly resulting from their capability of capturing high-order proximity messages over the knowledge graphs. However, training KGNNs at scale is challenging due to the high memory usage. In the forward pass, the automatic differentiation engines (e.g., TensorFlow/PyTorch) generally need to cache all intermediate activation maps in order to compute gradients in the backward pass, which leads to a large GPU memory footprint. Existing work solves this problem by utilizing multi-GPU distributed frameworks. Nonetheless, this poses a practical challenge when seeking to deploy KGNNs in memory-constrained environments, especially for industry-scale graphs. Here we present TinyKG, a memory-efficient GPU-based training framework for KGNNs for the tasks of recommendation. Specifically, TinyKG uses exact activations in the forward pass while storing a quantized version of activations in the GPU buffers. During the backward pass, these low-precision activations are dequantized back to full-precision tensors, in order to compute gradients. To reduce the quantization errors, TinyKG applies a simple yet effective quantization algorithm to compress the activations, which ensures unbiasedness with low variance. As such, the training memory footprint of KGNNs is largely reduced with negligible accuracy loss. To evaluate the performance of our TinyKG, we conduct comprehensive experiments on real-world datasets. We found that our TinyKG with INT2 quantization aggressively reduces the memory footprint of activation maps with 7 ×, only with 2% loss in accuracy, allowing us to deploy KGNNs on memory-constrained devices.

查看原文本刊更多论文

TinyKG:知识图神经推荐系统的记忆效率训练框架

人们对设计各种知识图神经网络(kgnn)的兴趣激增，这些网络实现了最先进的性能，并为推荐提供了很好的可解释性。有希望的性能主要是由于它们能够捕获知识图上的高阶接近消息。然而，由于高内存使用量，大规模训练kgnn具有挑战性。在前向传递中，自动微分引擎(例如，TensorFlow/PyTorch)通常需要缓存所有中间激活映射，以便在后向传递中计算梯度，这会导致大量GPU内存占用。现有的工作通过使用多gpu分布式框架解决了这个问题。尽管如此，当寻求在内存受限的环境中部署kgnn时，这提出了一个实际的挑战，特别是对于工业规模的图。在这里，我们提出了TinyKG，一个内存高效的基于gpu的kgnn训练框架，用于推荐任务。具体来说，TinyKG在向前传递中使用精确的激活，同时在GPU缓冲区中存储量化版本的激活。在反向传递期间，这些低精度激活被去量化回全精度张量，以便计算梯度。为了减少量化误差，TinyKG采用简单有效的量化算法对激活进行压缩，保证了低方差的无偏性。因此，kgnn的训练内存占用大大减少，精度损失可以忽略不计。为了评估我们的TinyKG的性能，我们在真实的数据集上进行了全面的实验。我们发现，带有INT2量化的TinyKG大幅减少了激活图的内存占用，降低了7倍，准确性仅下降了2%，使我们能够在内存受限的设备上部署kgnn。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 16th ACM Conference on Recommender Systems

自引率

0.00%

发文量