Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification.

IF 2.1 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY

Entropy Pub Date : 2025-03-14 DOI:10.3390/e27030304

Ahmed Begga, Francisco Escolano Ruiz, Miguel Ángel Lozano

{"title":"Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification.","authors":"Ahmed Begga, Francisco Escolano Ruiz, Miguel Ángel Lozano","doi":"10.3390/e27030304","DOIUrl":null,"url":null,"abstract":"In this paper, we define and characterize the embedding of edges and higher-order entities in directed graphs (digraphs) and relate these embeddings to those of nodes. Our edge-centric approach consists of the following: (a) Embedding line digraphs (or their iterated versions); (b) Exploiting the rank properties of these embeddings to show that edge/path similarity can be posed as a linear combination of node similarities; (c) Solving scalability issues through digraph sparsification; (d) Evaluating the performance of these embeddings for classification and clustering. We commence by identifying the motive behind the need for edge-centric approaches. Then we proceed to introduce all the elements of the approach, and finally, we validate it. Our edge-centric embedding entails a top-down mining of links, instead of inferring them from the similarities of node embeddings. This analysis is key to discovering inter-subgraph links that hold the whole graph connected, i.e., central edges. Using directed graphs (digraphs) allows us to cluster edge-like hubs and authorities. In addition, since directed edges inherit their labels from destination (origin) nodes, their embedding provides a proxy representation for node classification and clustering as well. This representation is obtained by embedding the line digraph of the original one. The line digraph provides nice formal properties with respect to the original graph; in particular, it produces more entropic latent spaces. With these properties at hand, we can relate edge embeddings to node embeddings. The main contribution of this paper is to set and prove the linearity theorem, which poses each element of the transition matrix for an edge embedding as a linear combination of the elements of the transition matrix for the node embedding. As a result, the rank preservation property explains why embedding the line digraph and using the labels of the destination nodes provides better classification and clustering performances than embedding the nodes of the original graph. In other words, we do not only facilitate edge mining but enforce node classification and clustering. However, computing the line digraph is challenging, and a sparsification strategy is implemented for the sake of scalability. Our experimental results show that the line digraph representation of the sparsified input graph is quite stable as we increase the sparsification level, and also that it outperforms the original (node-centric) representation. For the sake of simplicity, our theorem relies on node2vec-like (factorization) embeddings. However, we also include several experiments showing how line digraphs may improve the performance of Graph Neural Networks (GNNs), also following the principle of maximum entropy.","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 3","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11941605/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27030304","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we define and characterize the embedding of edges and higher-order entities in directed graphs (digraphs) and relate these embeddings to those of nodes. Our edge-centric approach consists of the following: (a) Embedding line digraphs (or their iterated versions); (b) Exploiting the rank properties of these embeddings to show that edge/path similarity can be posed as a linear combination of node similarities; (c) Solving scalability issues through digraph sparsification; (d) Evaluating the performance of these embeddings for classification and clustering. We commence by identifying the motive behind the need for edge-centric approaches. Then we proceed to introduce all the elements of the approach, and finally, we validate it. Our edge-centric embedding entails a top-down mining of links, instead of inferring them from the similarities of node embeddings. This analysis is key to discovering inter-subgraph links that hold the whole graph connected, i.e., central edges. Using directed graphs (digraphs) allows us to cluster edge-like hubs and authorities. In addition, since directed edges inherit their labels from destination (origin) nodes, their embedding provides a proxy representation for node classification and clustering as well. This representation is obtained by embedding the line digraph of the original one. The line digraph provides nice formal properties with respect to the original graph; in particular, it produces more entropic latent spaces. With these properties at hand, we can relate edge embeddings to node embeddings. The main contribution of this paper is to set and prove the linearity theorem, which poses each element of the transition matrix for an edge embedding as a linear combination of the elements of the transition matrix for the node embedding. As a result, the rank preservation property explains why embedding the line digraph and using the labels of the destination nodes provides better classification and clustering performances than embedding the nodes of the original graph. In other words, we do not only facilitate edge mining but enforce node classification and clustering. However, computing the line digraph is challenging, and a sparsification strategy is implemented for the sake of scalability. Our experimental results show that the line digraph representation of the sparsified input graph is quite stable as we increase the sparsification level, and also that it outperforms the original (node-centric) representation. For the sake of simplicity, our theorem relies on node2vec-like (factorization) embeddings. However, we also include several experiments showing how line digraphs may improve the performance of Graph Neural Networks (GNNs), also following the principle of maximum entropy.

查看原文本刊更多论文

在本文中，我们定义并描述了有向图（数字图）中边和高阶实体的嵌入，并将这些嵌入与节点的嵌入联系起来。我们以边缘为中心的方法包括以下内容：(a) 嵌入线图（或其迭代版本）；(b) 利用这些嵌入的秩属性来证明边缘/路径相似性可被视为节点相似性的线性组合；(c) 通过数图稀疏化来解决可扩展性问题；(d) 评估这些嵌入在分类和聚类方面的性能。我们首先要确定需要以边缘为中心的方法背后的动机。然后介绍该方法的所有要素，最后对其进行验证。我们以边缘为中心的嵌入需要自上而下地挖掘链接，而不是从节点嵌入的相似性中推断链接。这种分析是发现保持整个图连接的子图间链接（即中心边）的关键。使用有向图（数字图）可以让我们对类似边缘的中心和权威进行聚类。此外，由于有向边从目的（起源）节点继承其标签，因此它们的嵌入也为节点分类和聚类提供了代理表示。这种表示法是通过嵌入原始表示法的线段图获得的。与原始图相比，线段图具有很好的形式属性；特别是，它能产生更多的熵潜空间。有了这些特性，我们就能将边嵌入与节点嵌入联系起来。本文的主要贡献在于设定并证明了线性定理，该定理将边缘嵌入的过渡矩阵的每个元素都视为节点嵌入的过渡矩阵元素的线性组合。因此，秩保存特性解释了为什么嵌入线段图并使用目的节点的标签比嵌入原始图的节点能提供更好的分类和聚类性能。换句话说，我们不仅促进了边的挖掘，而且执行了节点分类和聚类。然而，计算线段图具有挑战性，因此为了提高可扩展性，我们采用了稀疏化策略。实验结果表明，随着稀疏化程度的提高，稀疏化输入图的线图表示相当稳定，而且优于原始（以节点为中心的）表示。为简单起见，我们的定理依赖于类似节点 2vec 的（因式分解）嵌入。不过，我们也进行了一些实验，展示了线图如何改善图神经网络（GNN）的性能，同样遵循最大熵原则。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Entropy PHYSICS, MULTIDISCIPLINARY-

CiteScore

4.90

自引率

11.10%

发文量

1580

审稿时长

21.05 days

期刊介绍： Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.