Yunfei He;Yang Wu;Lishan Huang;Zhenwan Peng;Fei Yang;Yiwen Zhang;Victor S Sheng
{"title":"AGSEI: Adaptive Graph Structure Estimation With Long-Tail Distributed Implicit Graphs","authors":"Yunfei He;Yang Wu;Lishan Huang;Zhenwan Peng;Fei Yang;Yiwen Zhang;Victor S Sheng","doi":"10.1109/TETC.2024.3480132","DOIUrl":null,"url":null,"abstract":"Empowered by their remarkable advantages, graph neural networks (GNN) serve as potent tools for embedding graph-structured data and finding applications across various domains. Particularly, a prevalent assumption in most GNNs is the reliability of the underlying graph structure. This assumption, often implicit, can inadvertently lead to the propagation of misleading information through structures like false links. In response to this challenge, numerous methods for graph structure learning (GSL) have been developed. Among these methods, one popular approach is to construct a simple and intuitive K-nearest neighbor (KNN) graph as a sample to infer true graph structure. However, KNN graphs that follow the single-point distribution can easily mislead the true graph structure estimation. The primary reason is that, from a statistical perspective, the KNN graph, as a sample, follows a single-point distribution, whereas the true graph structure, as the population, as a whole mostly follows a long-tail distribution. In theory, the sample and the population should share the same distribution; otherwise, accurately inferring the true graph structure becomes challenging. To address this problem, this paper proposes an Adaptive Graph Structure Estimation with Long-Tail Distributed Implicit Graph, referred to as AGSEI. AGSEI comprises three main components: long-tail implicit graph construction, explicit graph structure estimation, and joint optimization. The first component relies on a multi-layer graph convolutional network to learn low-order to high-order node representations, compute node similarity, and construct several corresponding long-tail implicit graphs. Since the original imperfect graph structure can mislead GNNs into propagating false information, it reduces the reliability of the long-tail implicit graphs. AGSEI attempts to limit the aggregation of irrelevant information by introducing the Hilbert-Schmidt independence criterion. That is, maximizing the dependence between the predicted label and ground truth. With this strategy, AGSEI can learn node features dependent on labels to facilitate the construction of reliable long-tail implicit graphs, and then provide adaptive multi-view graph structure information to support subsequent GSL. In the second component, the graph structure is estimated using the stochastic block model (SBM) with the Expectation-Maximization algorithm. Considering that it is difficult for a single GSL to approach the true graph structure, the third part considers the joint optimization of the long-tail implicit graph construction and the explicit graph structure estimation. This involves optimizing the two parts alternately until the model converges. We conducted multiple experiments on five public datasets, including tasks such as classification and clustering. These experiments not only demonstrated the performance of AGSEI but also confirmed that the graph structures it estimates align with the long-tail distribution.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"698-713"},"PeriodicalIF":5.4000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10726711/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Empowered by their remarkable advantages, graph neural networks (GNN) serve as potent tools for embedding graph-structured data and finding applications across various domains. Particularly, a prevalent assumption in most GNNs is the reliability of the underlying graph structure. This assumption, often implicit, can inadvertently lead to the propagation of misleading information through structures like false links. In response to this challenge, numerous methods for graph structure learning (GSL) have been developed. Among these methods, one popular approach is to construct a simple and intuitive K-nearest neighbor (KNN) graph as a sample to infer true graph structure. However, KNN graphs that follow the single-point distribution can easily mislead the true graph structure estimation. The primary reason is that, from a statistical perspective, the KNN graph, as a sample, follows a single-point distribution, whereas the true graph structure, as the population, as a whole mostly follows a long-tail distribution. In theory, the sample and the population should share the same distribution; otherwise, accurately inferring the true graph structure becomes challenging. To address this problem, this paper proposes an Adaptive Graph Structure Estimation with Long-Tail Distributed Implicit Graph, referred to as AGSEI. AGSEI comprises three main components: long-tail implicit graph construction, explicit graph structure estimation, and joint optimization. The first component relies on a multi-layer graph convolutional network to learn low-order to high-order node representations, compute node similarity, and construct several corresponding long-tail implicit graphs. Since the original imperfect graph structure can mislead GNNs into propagating false information, it reduces the reliability of the long-tail implicit graphs. AGSEI attempts to limit the aggregation of irrelevant information by introducing the Hilbert-Schmidt independence criterion. That is, maximizing the dependence between the predicted label and ground truth. With this strategy, AGSEI can learn node features dependent on labels to facilitate the construction of reliable long-tail implicit graphs, and then provide adaptive multi-view graph structure information to support subsequent GSL. In the second component, the graph structure is estimated using the stochastic block model (SBM) with the Expectation-Maximization algorithm. Considering that it is difficult for a single GSL to approach the true graph structure, the third part considers the joint optimization of the long-tail implicit graph construction and the explicit graph structure estimation. This involves optimizing the two parts alternately until the model converges. We conducted multiple experiments on five public datasets, including tasks such as classification and clustering. These experiments not only demonstrated the performance of AGSEI but also confirmed that the graph structures it estimates align with the long-tail distribution.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.