Evaluating node embeddings of complex networks

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS
Arash Dehghan-Kooshkghazi;Bogumił Kamiński;Łukasz Kraiński;Paweł Prałat;François Théberge;Ali Pinar
{"title":"Evaluating node embeddings of complex networks","authors":"Arash Dehghan-Kooshkghazi;Bogumił Kamiński;Łukasz Kraiński;Paweł Prałat;François Théberge;Ali Pinar","doi":"10.1093/comnet/cnac030","DOIUrl":null,"url":null,"abstract":"Graph embedding is a transformation of nodes of a graph into a set of vectors. A good embedding should capture the graph topology, node-to-node relationship and other relevant information about the graph, its subgraphs and nodes. If these objectives are achieved, an embedding is a meaningful, understandable, compressed representations of a network that can be used for other machine learning tools such as node classification, community detection or link prediction. In this article, we do a series of extensive experiments with selected graph embedding algorithms, both on real-world networks as well as artificially generated ones. Based on those experiments, we formulate the following general conclusions. First, we confirm the main problem of node embeddings that is rather well-known to practitioners but less documented in the literature. There exist many algorithms available to choose from which use different techniques and have various parameters that may be tuned, the dimension being one of them. One needs to ensure that embeddings describe the properties of the underlying graphs well but, as our experiments confirm, it highly depends on properties of the network at hand and the given application in mind. As a result, selecting the best embedding is a challenging task and very often requires domain experts. Since investigating embeddings in a supervised manner is computationally expensive, there is a need for an unsupervised tool that is able to select a handful of promising embeddings for future (supervised) investigation. A general framework, introduced recently in the literature and easily available on GitHub repository, provides one of the very first tools for an unsupervised graph embedding comparison by assigning the ‘divergence score’ to embeddings with a goal of distinguishing good from bad ones. We show that the divergence score strongly correlates with the quality of embeddings by investigating three main applications of node embeddings: node classification, community detection and link prediction.","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"100","ListUrlMain":"https://ieeexplore.ieee.org/document/10070454/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 11

Abstract

Graph embedding is a transformation of nodes of a graph into a set of vectors. A good embedding should capture the graph topology, node-to-node relationship and other relevant information about the graph, its subgraphs and nodes. If these objectives are achieved, an embedding is a meaningful, understandable, compressed representations of a network that can be used for other machine learning tools such as node classification, community detection or link prediction. In this article, we do a series of extensive experiments with selected graph embedding algorithms, both on real-world networks as well as artificially generated ones. Based on those experiments, we formulate the following general conclusions. First, we confirm the main problem of node embeddings that is rather well-known to practitioners but less documented in the literature. There exist many algorithms available to choose from which use different techniques and have various parameters that may be tuned, the dimension being one of them. One needs to ensure that embeddings describe the properties of the underlying graphs well but, as our experiments confirm, it highly depends on properties of the network at hand and the given application in mind. As a result, selecting the best embedding is a challenging task and very often requires domain experts. Since investigating embeddings in a supervised manner is computationally expensive, there is a need for an unsupervised tool that is able to select a handful of promising embeddings for future (supervised) investigation. A general framework, introduced recently in the literature and easily available on GitHub repository, provides one of the very first tools for an unsupervised graph embedding comparison by assigning the ‘divergence score’ to embeddings with a goal of distinguishing good from bad ones. We show that the divergence score strongly correlates with the quality of embeddings by investigating three main applications of node embeddings: node classification, community detection and link prediction.
评估复杂网络的节点嵌入
图嵌入是将图的节点转换为一组向量。一个好的嵌入应该捕获图的拓扑结构、节点到节点的关系以及关于图、其子图和节点的其他相关信息。如果实现了这些目标,嵌入是一种有意义、可理解的网络压缩表示,可用于其他机器学习工具,如节点分类、社区检测或链接预测。在本文中,我们对选定的图嵌入算法进行了一系列广泛的实验,无论是在真实世界的网络上还是在人工生成的网络上。基于这些实验,我们得出以下一般结论。首先,我们证实了节点嵌入的主要问题,这对从业者来说是众所周知的,但在文献中记载较少。存在许多可供选择的算法,它们使用不同的技术,并具有可以调整的各种参数,维度就是其中之一。我们需要确保嵌入能够很好地描述底层图的属性,但正如我们的实验所证实的那样,它在很大程度上取决于手头网络的属性和所考虑的给定应用程序。因此,选择最佳嵌入是一项具有挑战性的任务,通常需要领域专家。由于以有监督的方式研究嵌入在计算上是昂贵的,因此需要一种无监督的工具,该工具能够为未来(有监督的)研究选择少数有前途的嵌入。最近在文献中引入的一个通用框架在GitHub存储库中很容易获得,它为无监督的图嵌入比较提供了最早的工具之一,通过为嵌入分配“分歧分数”来区分好坏。我们通过研究节点嵌入的三个主要应用:节点分类、社区检测和链接预测,表明分歧得分与嵌入质量密切相关。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信