Virtual Node Generation for Node Classification in Sparsely-Labeled Graphs

Hang Cui, Tarek Abdelzaher
{"title":"Virtual Node Generation for Node Classification in Sparsely-Labeled Graphs","authors":"Hang Cui, Tarek Abdelzaher","doi":"arxiv-2409.07712","DOIUrl":null,"url":null,"abstract":"In the broader machine learning literature, data-generation methods\ndemonstrate promising results by generating additional informative training\nexamples via augmenting sparse labels. Such methods are less studied in graphs\ndue to the intricate dependencies among nodes in complex topology structures.\nThis paper presents a novel node generation method that infuses a small set of\nhigh-quality synthesized nodes into the graph as additional labeled nodes to\noptimally expand the propagation of labeled information. By simply infusing\nadditional nodes, the framework is orthogonal to the graph learning and\ndownstream classification techniques, and thus is compatible with most popular\ngraph pre-training (self-supervised learning), semi-supervised learning, and\nmeta-learning methods. The contribution lies in designing the generated node\nset by solving a novel optimization problem. The optimization places the\ngenerated nodes in a manner that: (1) minimizes the classification loss to\nguarantee training accuracy and (2) maximizes label propagation to\nlow-confidence nodes in the downstream task to ensure high-quality propagation.\nTheoretically, we show that the above dual optimization maximizes the global\nconfidence of node classification. Our Experiments demonstrate statistically\nsignificant performance improvements over 14 baselines on 10 publicly available\ndatasets.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the broader machine learning literature, data-generation methods demonstrate promising results by generating additional informative training examples via augmenting sparse labels. Such methods are less studied in graphs due to the intricate dependencies among nodes in complex topology structures. This paper presents a novel node generation method that infuses a small set of high-quality synthesized nodes into the graph as additional labeled nodes to optimally expand the propagation of labeled information. By simply infusing additional nodes, the framework is orthogonal to the graph learning and downstream classification techniques, and thus is compatible with most popular graph pre-training (self-supervised learning), semi-supervised learning, and meta-learning methods. The contribution lies in designing the generated node set by solving a novel optimization problem. The optimization places the generated nodes in a manner that: (1) minimizes the classification loss to guarantee training accuracy and (2) maximizes label propagation to low-confidence nodes in the downstream task to ensure high-quality propagation. Theoretically, we show that the above dual optimization maximizes the global confidence of node classification. Our Experiments demonstrate statistically significant performance improvements over 14 baselines on 10 publicly available datasets.
为稀疏标记图中的节点分类生成虚拟节点
在更广泛的机器学习文献中,数据生成方法通过增加稀疏标签来生成额外的信息训练样本,从而展示了很有前景的结果。由于复杂拓扑结构中节点之间错综复杂的依赖关系,此类方法在图中的研究较少。本文提出了一种新颖的节点生成方法,该方法将一小部分高质量的合成节点作为附加标签节点注入图中,从而最大限度地扩大了标签信息的传播范围。通过简单地注入额外节点,该框架与图学习和下游分类技术是正交的,因此与大多数流行的图预训练(自我监督学习)、半监督学习和元学习方法是兼容的。它的贡献在于通过解决一个新颖的优化问题来设计生成的节点集。该优化方法将生成的节点以如下方式放置(从理论上讲,我们证明了上述双重优化能最大化节点分类的全局置信度。我们的实验表明,在 10 个公开可用的数据集上,与 14 个基线相比,我们的性能有了统计上的显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信