{"title":"知识图嵌入高效训练的动态策略","authors":"Anwesh Panda, Sathish S. Vadhiyar","doi":"10.1145/3545008.3545075","DOIUrl":null,"url":null,"abstract":"Knowledge graph embeddings (KGEs) are the low dimensional representations of entities and relations between the entities. They can be used for various downstream tasks such as triple classification, link prediction, knowledge base completion, etc. Training these embeddings for a large dataset takes a huge amount of time. This work proposes strategies to make the training of KGEs faster in a distributed memory parallel environment. The first strategy is to choose between either an all-gather or an all-reduce operation based on the sparsity of the gradient matrix. The second strategy focuses on selecting those gradient vectors which significantly contribute to the reduction in the loss. The third strategy employs gradient quantization to reduce the number of bits to be communicated. The fourth strategy proposes to split the knowledge graph triples based on relations so that inter-node communication for the gradient matrix corresponding to the relation embedding matrix is eliminated. The fifth and last strategy is to select the negative triple which the model finds difficult to classify. All the strategies are combined and this allows us to train the ComplEx Knowledge Graph Embedding (KGE) model on the FB250K dataset in 6 hours with 16 nodes when compared to 11.5 hours taken to train on the same number of nodes without applying any of the above optimizations. This reduction in training time is also accompanied by a significant improvement in Mean Reciprocal Rank (MRR) and Triple Classification Accuracy (TCA).","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamic Strategies for High Performance Training of Knowledge Graph Embeddings\",\"authors\":\"Anwesh Panda, Sathish S. Vadhiyar\",\"doi\":\"10.1145/3545008.3545075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Knowledge graph embeddings (KGEs) are the low dimensional representations of entities and relations between the entities. They can be used for various downstream tasks such as triple classification, link prediction, knowledge base completion, etc. Training these embeddings for a large dataset takes a huge amount of time. This work proposes strategies to make the training of KGEs faster in a distributed memory parallel environment. The first strategy is to choose between either an all-gather or an all-reduce operation based on the sparsity of the gradient matrix. The second strategy focuses on selecting those gradient vectors which significantly contribute to the reduction in the loss. The third strategy employs gradient quantization to reduce the number of bits to be communicated. The fourth strategy proposes to split the knowledge graph triples based on relations so that inter-node communication for the gradient matrix corresponding to the relation embedding matrix is eliminated. The fifth and last strategy is to select the negative triple which the model finds difficult to classify. All the strategies are combined and this allows us to train the ComplEx Knowledge Graph Embedding (KGE) model on the FB250K dataset in 6 hours with 16 nodes when compared to 11.5 hours taken to train on the same number of nodes without applying any of the above optimizations. This reduction in training time is also accompanied by a significant improvement in Mean Reciprocal Rank (MRR) and Triple Classification Accuracy (TCA).\",\"PeriodicalId\":360504,\"journal\":{\"name\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3545008.3545075\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dynamic Strategies for High Performance Training of Knowledge Graph Embeddings
Knowledge graph embeddings (KGEs) are the low dimensional representations of entities and relations between the entities. They can be used for various downstream tasks such as triple classification, link prediction, knowledge base completion, etc. Training these embeddings for a large dataset takes a huge amount of time. This work proposes strategies to make the training of KGEs faster in a distributed memory parallel environment. The first strategy is to choose between either an all-gather or an all-reduce operation based on the sparsity of the gradient matrix. The second strategy focuses on selecting those gradient vectors which significantly contribute to the reduction in the loss. The third strategy employs gradient quantization to reduce the number of bits to be communicated. The fourth strategy proposes to split the knowledge graph triples based on relations so that inter-node communication for the gradient matrix corresponding to the relation embedding matrix is eliminated. The fifth and last strategy is to select the negative triple which the model finds difficult to classify. All the strategies are combined and this allows us to train the ComplEx Knowledge Graph Embedding (KGE) model on the FB250K dataset in 6 hours with 16 nodes when compared to 11.5 hours taken to train on the same number of nodes without applying any of the above optimizations. This reduction in training time is also accompanied by a significant improvement in Mean Reciprocal Rank (MRR) and Triple Classification Accuracy (TCA).