Evaluating node embeddings of complex networks

IF 1.5 4区数学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of complex networks Pub Date : 2022-07-01 DOI:10.1093/comnet/cnac030

Arash Dehghan-Kooshkghazi;Bogumił Kamiński;Łukasz Kraiński;Paweł Prałat;François Théberge;Ali Pinar

{"title":"Evaluating node embeddings of complex networks","authors":"Arash Dehghan-Kooshkghazi;Bogumił Kamiński;Łukasz Kraiński;Paweł Prałat;François Théberge;Ali Pinar","doi":"10.1093/comnet/cnac030","DOIUrl":null,"url":null,"abstract":"Graph embedding is a transformation of nodes of a graph into a set of vectors. A good embedding should capture the graph topology, node-to-node relationship and other relevant information about the graph, its subgraphs and nodes. If these objectives are achieved, an embedding is a meaningful, understandable, compressed representations of a network that can be used for other machine learning tools such as node classification, community detection or link prediction. In this article, we do a series of extensive experiments with selected graph embedding algorithms, both on real-world networks as well as artificially generated ones. Based on those experiments, we formulate the following general conclusions. First, we confirm the main problem of node embeddings that is rather well-known to practitioners but less documented in the literature. There exist many algorithms available to choose from which use different techniques and have various parameters that may be tuned, the dimension being one of them. One needs to ensure that embeddings describe the properties of the underlying graphs well but, as our experiments confirm, it highly depends on properties of the network at hand and the given application in mind. As a result, selecting the best embedding is a challenging task and very often requires domain experts. Since investigating embeddings in a supervised manner is computationally expensive, there is a need for an unsupervised tool that is able to select a handful of promising embeddings for future (supervised) investigation. A general framework, introduced recently in the literature and easily available on GitHub repository, provides one of the very first tools for an unsupervised graph embedding comparison by assigning the ‘divergence score’ to embeddings with a goal of distinguishing good from bad ones. We show that the divergence score strongly correlates with the quality of embeddings by investigating three main applications of node embeddings: node classification, community detection and link prediction.","PeriodicalId":15442,"journal":{"name":"Journal of complex networks","volume":"10 4","pages":"56001-1098"},"PeriodicalIF":1.5000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of complex networks","FirstCategoryId":"100","ListUrlMain":"https://ieeexplore.ieee.org/document/10070454/","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 11

Abstract

Graph embedding is a transformation of nodes of a graph into a set of vectors. A good embedding should capture the graph topology, node-to-node relationship and other relevant information about the graph, its subgraphs and nodes. If these objectives are achieved, an embedding is a meaningful, understandable, compressed representations of a network that can be used for other machine learning tools such as node classification, community detection or link prediction. In this article, we do a series of extensive experiments with selected graph embedding algorithms, both on real-world networks as well as artificially generated ones. Based on those experiments, we formulate the following general conclusions. First, we confirm the main problem of node embeddings that is rather well-known to practitioners but less documented in the literature. There exist many algorithms available to choose from which use different techniques and have various parameters that may be tuned, the dimension being one of them. One needs to ensure that embeddings describe the properties of the underlying graphs well but, as our experiments confirm, it highly depends on properties of the network at hand and the given application in mind. As a result, selecting the best embedding is a challenging task and very often requires domain experts. Since investigating embeddings in a supervised manner is computationally expensive, there is a need for an unsupervised tool that is able to select a handful of promising embeddings for future (supervised) investigation. A general framework, introduced recently in the literature and easily available on GitHub repository, provides one of the very first tools for an unsupervised graph embedding comparison by assigning the ‘divergence score’ to embeddings with a goal of distinguishing good from bad ones. We show that the divergence score strongly correlates with the quality of embeddings by investigating three main applications of node embeddings: node classification, community detection and link prediction.

查看原文本刊更多论文

评估复杂网络的节点嵌入

图嵌入是将图的节点转换为一组向量。一个好的嵌入应该捕获图的拓扑结构、节点到节点的关系以及关于图、其子图和节点的其他相关信息。如果实现了这些目标，嵌入是一种有意义、可理解的网络压缩表示，可用于其他机器学习工具，如节点分类、社区检测或链接预测。在本文中，我们对选定的图嵌入算法进行了一系列广泛的实验，无论是在真实世界的网络上还是在人工生成的网络上。基于这些实验，我们得出以下一般结论。首先，我们证实了节点嵌入的主要问题，这对从业者来说是众所周知的，但在文献中记载较少。存在许多可供选择的算法，它们使用不同的技术，并具有可以调整的各种参数，维度就是其中之一。我们需要确保嵌入能够很好地描述底层图的属性，但正如我们的实验所证实的那样，它在很大程度上取决于手头网络的属性和所考虑的给定应用程序。因此，选择最佳嵌入是一项具有挑战性的任务，通常需要领域专家。由于以有监督的方式研究嵌入在计算上是昂贵的，因此需要一种无监督的工具，该工具能够为未来（有监督的）研究选择少数有前途的嵌入。最近在文献中引入的一个通用框架在GitHub存储库中很容易获得，它为无监督的图嵌入比较提供了最早的工具之一，通过为嵌入分配“分歧分数”来区分好坏。我们通过研究节点嵌入的三个主要应用：节点分类、社区检测和链接预测，表明分歧得分与嵌入质量密切相关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of complex networks MATHEMATICS, INTERDISCIPLINARY APPLICATIONS-

CiteScore

4.20

自引率

9.50%

发文量

期刊介绍： Journal of Complex Networks publishes original articles and reviews with a significant contribution to the analysis and understanding of complex networks and its applications in diverse fields. Complex networks are loosely defined as networks with nontrivial topology and dynamics, which appear as the skeletons of complex systems in the real-world. The journal covers everything from the basic mathematical, physical and computational principles needed for studying complex networks to their applications leading to predictive models in molecular, biological, ecological, informational, engineering, social, technological and other systems. It includes, but is not limited to, the following topics: - Mathematical and numerical analysis of networks - Network theory and computer sciences - Structural analysis of networks - Dynamics on networks - Physical models on networks - Networks and epidemiology - Social, socio-economic and political networks - Ecological networks - Technological and infrastructural networks - Brain and tissue networks - Biological and molecular networks - Spatial networks - Techno-social networks i.e. online social networks, social networking sites, social media - Other applications of networks - Evolving networks - Multilayer networks - Game theory on networks - Biomedicine related networks - Animal social networks - Climate networks - Cognitive, language and informational network