知识图嵌入高效训练的动态策略

Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI:10.1145/3545008.3545075

Anwesh Panda, Sathish S. Vadhiyar

{"title":"知识图嵌入高效训练的动态策略","authors":"Anwesh Panda, Sathish S. Vadhiyar","doi":"10.1145/3545008.3545075","DOIUrl":null,"url":null,"abstract":"Knowledge graph embeddings (KGEs) are the low dimensional representations of entities and relations between the entities. They can be used for various downstream tasks such as triple classification, link prediction, knowledge base completion, etc. Training these embeddings for a large dataset takes a huge amount of time. This work proposes strategies to make the training of KGEs faster in a distributed memory parallel environment. The first strategy is to choose between either an all-gather or an all-reduce operation based on the sparsity of the gradient matrix. The second strategy focuses on selecting those gradient vectors which significantly contribute to the reduction in the loss. The third strategy employs gradient quantization to reduce the number of bits to be communicated. The fourth strategy proposes to split the knowledge graph triples based on relations so that inter-node communication for the gradient matrix corresponding to the relation embedding matrix is eliminated. The fifth and last strategy is to select the negative triple which the model finds difficult to classify. All the strategies are combined and this allows us to train the ComplEx Knowledge Graph Embedding (KGE) model on the FB250K dataset in 6 hours with 16 nodes when compared to 11.5 hours taken to train on the same number of nodes without applying any of the above optimizations. This reduction in training time is also accompanied by a significant improvement in Mean Reciprocal Rank (MRR) and Triple Classification Accuracy (TCA).","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamic Strategies for High Performance Training of Knowledge Graph Embeddings\",\"authors\":\"Anwesh Panda, Sathish S. Vadhiyar\",\"doi\":\"10.1145/3545008.3545075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Knowledge graph embeddings (KGEs) are the low dimensional representations of entities and relations between the entities. They can be used for various downstream tasks such as triple classification, link prediction, knowledge base completion, etc. Training these embeddings for a large dataset takes a huge amount of time. This work proposes strategies to make the training of KGEs faster in a distributed memory parallel environment. The first strategy is to choose between either an all-gather or an all-reduce operation based on the sparsity of the gradient matrix. The second strategy focuses on selecting those gradient vectors which significantly contribute to the reduction in the loss. The third strategy employs gradient quantization to reduce the number of bits to be communicated. The fourth strategy proposes to split the knowledge graph triples based on relations so that inter-node communication for the gradient matrix corresponding to the relation embedding matrix is eliminated. The fifth and last strategy is to select the negative triple which the model finds difficult to classify. All the strategies are combined and this allows us to train the ComplEx Knowledge Graph Embedding (KGE) model on the FB250K dataset in 6 hours with 16 nodes when compared to 11.5 hours taken to train on the same number of nodes without applying any of the above optimizations. This reduction in training time is also accompanied by a significant improvement in Mean Reciprocal Rank (MRR) and Triple Classification Accuracy (TCA).\",\"PeriodicalId\":360504,\"journal\":{\"name\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3545008.3545075\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

知识图嵌入是实体和实体之间关系的低维表示。它们可用于各种下游任务，如三重分类、链接预测、知识库完成等。为大型数据集训练这些嵌入需要花费大量时间。这项工作提出了在分布式内存并行环境下更快地训练kge的策略。第一种策略是根据梯度矩阵的稀疏性在全聚或全约操作之间进行选择。第二种策略侧重于选择那些显著有助于减少损失的梯度向量。第三种策略采用梯度量化来减少要通信的比特数。第四种策略是基于关系对知识图三元组进行拆分，从而消除关系嵌入矩阵对应的梯度矩阵的节点间通信。第五个也是最后一个策略是选择模型难以分类的负三重。所有的策略都结合在一起，这使我们能够在FB250K数据集上训练复杂知识图嵌入(KGE)模型，在6小时内训练16个节点，而在没有应用任何上述优化的情况下，在相同数量的节点上训练需要11.5小时。训练时间的减少也伴随着平均倒数秩(MRR)和三重分类精度(TCA)的显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic Strategies for High Performance Training of Knowledge Graph Embeddings

Knowledge graph embeddings (KGEs) are the low dimensional representations of entities and relations between the entities. They can be used for various downstream tasks such as triple classification, link prediction, knowledge base completion, etc. Training these embeddings for a large dataset takes a huge amount of time. This work proposes strategies to make the training of KGEs faster in a distributed memory parallel environment. The first strategy is to choose between either an all-gather or an all-reduce operation based on the sparsity of the gradient matrix. The second strategy focuses on selecting those gradient vectors which significantly contribute to the reduction in the loss. The third strategy employs gradient quantization to reduce the number of bits to be communicated. The fourth strategy proposes to split the knowledge graph triples based on relations so that inter-node communication for the gradient matrix corresponding to the relation embedding matrix is eliminated. The fifth and last strategy is to select the negative triple which the model finds difficult to classify. All the strategies are combined and this allows us to train the ComplEx Knowledge Graph Embedding (KGE) model on the FB250K dataset in 6 hours with 16 nodes when compared to 11.5 hours taken to train on the same number of nodes without applying any of the above optimizations. This reduction in training time is also accompanied by a significant improvement in Mean Reciprocal Rank (MRR) and Triple Classification Accuracy (TCA).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 51st International Conference on Parallel Processing

自引率

0.00%

发文量