{"title":"Virtual Node Generation for Node Classification in Sparsely-Labeled Graphs","authors":"Hang Cui, Tarek Abdelzaher","doi":"arxiv-2409.07712","DOIUrl":null,"url":null,"abstract":"In the broader machine learning literature, data-generation methods\ndemonstrate promising results by generating additional informative training\nexamples via augmenting sparse labels. Such methods are less studied in graphs\ndue to the intricate dependencies among nodes in complex topology structures.\nThis paper presents a novel node generation method that infuses a small set of\nhigh-quality synthesized nodes into the graph as additional labeled nodes to\noptimally expand the propagation of labeled information. By simply infusing\nadditional nodes, the framework is orthogonal to the graph learning and\ndownstream classification techniques, and thus is compatible with most popular\ngraph pre-training (self-supervised learning), semi-supervised learning, and\nmeta-learning methods. The contribution lies in designing the generated node\nset by solving a novel optimization problem. The optimization places the\ngenerated nodes in a manner that: (1) minimizes the classification loss to\nguarantee training accuracy and (2) maximizes label propagation to\nlow-confidence nodes in the downstream task to ensure high-quality propagation.\nTheoretically, we show that the above dual optimization maximizes the global\nconfidence of node classification. Our Experiments demonstrate statistically\nsignificant performance improvements over 14 baselines on 10 publicly available\ndatasets.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the broader machine learning literature, data-generation methods
demonstrate promising results by generating additional informative training
examples via augmenting sparse labels. Such methods are less studied in graphs
due to the intricate dependencies among nodes in complex topology structures.
This paper presents a novel node generation method that infuses a small set of
high-quality synthesized nodes into the graph as additional labeled nodes to
optimally expand the propagation of labeled information. By simply infusing
additional nodes, the framework is orthogonal to the graph learning and
downstream classification techniques, and thus is compatible with most popular
graph pre-training (self-supervised learning), semi-supervised learning, and
meta-learning methods. The contribution lies in designing the generated node
set by solving a novel optimization problem. The optimization places the
generated nodes in a manner that: (1) minimizes the classification loss to
guarantee training accuracy and (2) maximizes label propagation to
low-confidence nodes in the downstream task to ensure high-quality propagation.
Theoretically, we show that the above dual optimization maximizes the global
confidence of node classification. Our Experiments demonstrate statistically
significant performance improvements over 14 baselines on 10 publicly available
datasets.