AGSEI: Adaptive Graph Structure Estimation With Long-Tail Distributed Implicit Graphs

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing Pub Date : 2024-10-21 DOI:10.1109/TETC.2024.3480132

Yunfei He;Yang Wu;Lishan Huang;Zhenwan Peng;Fei Yang;Yiwen Zhang;Victor S Sheng

{"title":"AGSEI: Adaptive Graph Structure Estimation With Long-Tail Distributed Implicit Graphs","authors":"Yunfei He;Yang Wu;Lishan Huang;Zhenwan Peng;Fei Yang;Yiwen Zhang;Victor S Sheng","doi":"10.1109/TETC.2024.3480132","DOIUrl":null,"url":null,"abstract":"Empowered by their remarkable advantages, graph neural networks (GNN) serve as potent tools for embedding graph-structured data and finding applications across various domains. Particularly, a prevalent assumption in most GNNs is the reliability of the underlying graph structure. This assumption, often implicit, can inadvertently lead to the propagation of misleading information through structures like false links. In response to this challenge, numerous methods for graph structure learning (GSL) have been developed. Among these methods, one popular approach is to construct a simple and intuitive K-nearest neighbor (KNN) graph as a sample to infer true graph structure. However, KNN graphs that follow the single-point distribution can easily mislead the true graph structure estimation. The primary reason is that, from a statistical perspective, the KNN graph, as a sample, follows a single-point distribution, whereas the true graph structure, as the population, as a whole mostly follows a long-tail distribution. In theory, the sample and the population should share the same distribution; otherwise, accurately inferring the true graph structure becomes challenging. To address this problem, this paper proposes an Adaptive Graph Structure Estimation with Long-Tail Distributed Implicit Graph, referred to as AGSEI. AGSEI comprises three main components: long-tail implicit graph construction, explicit graph structure estimation, and joint optimization. The first component relies on a multi-layer graph convolutional network to learn low-order to high-order node representations, compute node similarity, and construct several corresponding long-tail implicit graphs. Since the original imperfect graph structure can mislead GNNs into propagating false information, it reduces the reliability of the long-tail implicit graphs. AGSEI attempts to limit the aggregation of irrelevant information by introducing the Hilbert-Schmidt independence criterion. That is, maximizing the dependence between the predicted label and ground truth. With this strategy, AGSEI can learn node features dependent on labels to facilitate the construction of reliable long-tail implicit graphs, and then provide adaptive multi-view graph structure information to support subsequent GSL. In the second component, the graph structure is estimated using the stochastic block model (SBM) with the Expectation-Maximization algorithm. Considering that it is difficult for a single GSL to approach the true graph structure, the third part considers the joint optimization of the long-tail implicit graph construction and the explicit graph structure estimation. This involves optimizing the two parts alternately until the model converges. We conducted multiple experiments on five public datasets, including tasks such as classification and clustering. These experiments not only demonstrated the performance of AGSEI but also confirmed that the graph structures it estimates align with the long-tail distribution.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"698-713"},"PeriodicalIF":5.4000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10726711/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Empowered by their remarkable advantages, graph neural networks (GNN) serve as potent tools for embedding graph-structured data and finding applications across various domains. Particularly, a prevalent assumption in most GNNs is the reliability of the underlying graph structure. This assumption, often implicit, can inadvertently lead to the propagation of misleading information through structures like false links. In response to this challenge, numerous methods for graph structure learning (GSL) have been developed. Among these methods, one popular approach is to construct a simple and intuitive K-nearest neighbor (KNN) graph as a sample to infer true graph structure. However, KNN graphs that follow the single-point distribution can easily mislead the true graph structure estimation. The primary reason is that, from a statistical perspective, the KNN graph, as a sample, follows a single-point distribution, whereas the true graph structure, as the population, as a whole mostly follows a long-tail distribution. In theory, the sample and the population should share the same distribution; otherwise, accurately inferring the true graph structure becomes challenging. To address this problem, this paper proposes an Adaptive Graph Structure Estimation with Long-Tail Distributed Implicit Graph, referred to as AGSEI. AGSEI comprises three main components: long-tail implicit graph construction, explicit graph structure estimation, and joint optimization. The first component relies on a multi-layer graph convolutional network to learn low-order to high-order node representations, compute node similarity, and construct several corresponding long-tail implicit graphs. Since the original imperfect graph structure can mislead GNNs into propagating false information, it reduces the reliability of the long-tail implicit graphs. AGSEI attempts to limit the aggregation of irrelevant information by introducing the Hilbert-Schmidt independence criterion. That is, maximizing the dependence between the predicted label and ground truth. With this strategy, AGSEI can learn node features dependent on labels to facilitate the construction of reliable long-tail implicit graphs, and then provide adaptive multi-view graph structure information to support subsequent GSL. In the second component, the graph structure is estimated using the stochastic block model (SBM) with the Expectation-Maximization algorithm. Considering that it is difficult for a single GSL to approach the true graph structure, the third part considers the joint optimization of the long-tail implicit graph construction and the explicit graph structure estimation. This involves optimizing the two parts alternately until the model converges. We conducted multiple experiments on five public datasets, including tasks such as classification and clustering. These experiments not only demonstrated the performance of AGSEI but also confirmed that the graph structures it estimates align with the long-tail distribution.

查看原文本刊更多论文

基于长尾分布隐式图的自适应图结构估计

由于其显著的优势，图神经网络（GNN）作为嵌入图结构数据和在各个领域寻找应用程序的有力工具。特别是，在大多数gnn中，一个普遍的假设是底层图结构的可靠性。这种假设通常是隐含的，可能会无意中通过虚假链接等结构导致误导性信息的传播。为了应对这一挑战，已经开发了许多图结构学习（GSL）方法。在这些方法中，一种流行的方法是构造一个简单直观的k近邻图（KNN）作为样本来推断真正的图结构。然而，遵循单点分布的KNN图很容易误导真实的图结构估计。主要原因是，从统计学的角度来看，KNN图作为样本，遵循单点分布，而真正的图结构作为总体，作为整体，大多遵循长尾分布。理论上，样本和总体应该具有相同的分布；否则，准确推断真实的图结构就变得很有挑战性。为了解决这一问题，本文提出了一种基于长尾分布隐式图的自适应图结构估计方法，简称AGSEI。AGSEI包括三个主要部分：长尾隐式图构建、显式图结构估计和联合优化。第一个组件依靠多层图卷积网络学习低阶到高阶节点表示，计算节点相似度，并构造几个相应的长尾隐式图。由于原始的不完美图结构会误导gnn传播虚假信息，从而降低了长尾隐式图的可靠性。AGSEI试图通过引入Hilbert-Schmidt独立性标准来限制不相关信息的聚合。也就是说，最大化预测标签和基础真值之间的依赖关系。利用该策略，AGSEI可以学习依赖于标签的节点特征，便于构建可靠的长尾隐式图，并提供自适应的多视图图结构信息，支持后续的GSL。在第二部分，使用随机块模型（SBM）和期望最大化算法估计图的结构。考虑到单个GSL难以接近真实图结构，第三部分考虑了长尾隐式图构造和显式图结构估计的联合优化。这涉及到交替优化这两个部分，直到模型收敛。我们在5个公共数据集上进行了多次实验，包括分类和聚类等任务。这些实验不仅证明了AGSEI的性能，而且证实了它估计的图结构与长尾分布一致。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computing Computer Science-Computer Science (miscellaneous)

CiteScore

12.10

自引率

5.10%

发文量

113

期刊介绍： IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.