{"title":"Contra2: A one-step active learning method for imbalanced graphs","authors":"Wenjie Yang , Shengzhong Zhang , Jiaxing Guo , Zengfeng Huang","doi":"10.1016/j.artint.2025.104439","DOIUrl":null,"url":null,"abstract":"<div><div>Graph active learning (GAL) is an important research direction in graph neural networks (GNNs) that aims to select the most valuable nodes for labeling to train GNNs. Previous works in GAL have primarily focused on the overall performance of GNNs, overlooking the balance among different classes. However, graphs in real-world applications are often imbalanced, which leads GAL methods to select class-imbalanced training sets, resulting in biased GNN models. Furthermore, due to the high cost of multi-turn queries, there is an increasing demand for one-step GAL methods, where the entire training set is queried at once. These realities prompt us to investigate the problem of one-step active learning on imbalanced graphs.</div><div>In this paper, we propose a theory-driven method called Contrast & Contract (Contra<sup>2</sup>) to tackle the above issues. The key idea of Contra<sup>2</sup> is that intra-class edges within the majority are dominant in the edge set, so contracting these edges will reduce the imbalance ratio. Specifically, Contra<sup>2</sup> first learns node representations by graph <strong>contrast</strong>ive learning (GCL), then stochastically <strong>contract</strong>s the edges that connect nodes with similar embeddings. We theoretically show that Contra<sup>2</sup> reduces the imbalance ratio with high probability. By leveraging a more evenly distributed graph, we can achieve a balanced selection of labeled nodes without requiring any seed labels. The effectiveness of Contra<sup>2</sup> is evaluated against various baselines on 11 datasets with different budgets. Contra<sup>2</sup> demonstrates remarkable performance, achieving either higher or on-par performance with only half of the annotation budget on some datasets.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"349 ","pages":"Article 104439"},"PeriodicalIF":4.6000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370225001584","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Graph active learning (GAL) is an important research direction in graph neural networks (GNNs) that aims to select the most valuable nodes for labeling to train GNNs. Previous works in GAL have primarily focused on the overall performance of GNNs, overlooking the balance among different classes. However, graphs in real-world applications are often imbalanced, which leads GAL methods to select class-imbalanced training sets, resulting in biased GNN models. Furthermore, due to the high cost of multi-turn queries, there is an increasing demand for one-step GAL methods, where the entire training set is queried at once. These realities prompt us to investigate the problem of one-step active learning on imbalanced graphs.
In this paper, we propose a theory-driven method called Contrast & Contract (Contra2) to tackle the above issues. The key idea of Contra2 is that intra-class edges within the majority are dominant in the edge set, so contracting these edges will reduce the imbalance ratio. Specifically, Contra2 first learns node representations by graph contrastive learning (GCL), then stochastically contracts the edges that connect nodes with similar embeddings. We theoretically show that Contra2 reduces the imbalance ratio with high probability. By leveraging a more evenly distributed graph, we can achieve a balanced selection of labeled nodes without requiring any seed labels. The effectiveness of Contra2 is evaluated against various baselines on 11 datasets with different budgets. Contra2 demonstrates remarkable performance, achieving either higher or on-par performance with only half of the annotation budget on some datasets.
期刊介绍:
The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.