Rethinking the impact of noisy labels in graph classification: A utility and privacy perspective

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2024-11-20 DOI:10.1016/j.neunet.2024.106919

De Li , Xianxian Li , Zeming Gan , Qiyu Li , Bin Qu , Jinyan Wang

{"title":"Rethinking the impact of noisy labels in graph classification: A utility and privacy perspective","authors":"De Li , Xianxian Li , Zeming Gan , Qiyu Li , Bin Qu , Jinyan Wang","doi":"10.1016/j.neunet.2024.106919","DOIUrl":null,"url":null,"abstract":"<div><div>Graph neural networks (GNNs) based on message-passing mechanisms have achieved advanced results in graph classification tasks. However, their generalization performance degrades when noisy labels are present in the training data. Most existing noisy labeling approaches focus on the visual domain or graph node classification tasks and analyze the impact of noisy labels only from a utility perspective. Unlike existing work, in this paper, we measure the effects of noise labels on graph classification from data privacy and model utility perspectives. We find that noise labels degrade the model’s generalization performance and enhance the ability of membership inference attacks on graph data privacy. To this end, we propose the robust graph neural network (RGLC) approach with noisy labeled graph classification. Specifically, we first accurately filter the noisy samples by high-confidence samples and the first feature principal component vector of each class. Then, the robust principal component vectors and the model output under data augmentation are utilized to achieve noise label correction guided by dual spatial information. Finally, supervised graph contrastive learning is introduced to enhance the embedding quality of the model and protect the privacy of the training graph data. The utility and privacy of the proposed method are validated by comparing twelve different methods on eight real graph classification datasets. Compared with the state-of-the-art methods, the RGLC method achieves at most and at least 7.8% and 0.8% performance gain at 30% noisy labeling rate, respectively, and reduces the accuracy of privacy attacks to below 60%.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"182 ","pages":"Article 106919"},"PeriodicalIF":6.0000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024008487","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Graph neural networks (GNNs) based on message-passing mechanisms have achieved advanced results in graph classification tasks. However, their generalization performance degrades when noisy labels are present in the training data. Most existing noisy labeling approaches focus on the visual domain or graph node classification tasks and analyze the impact of noisy labels only from a utility perspective. Unlike existing work, in this paper, we measure the effects of noise labels on graph classification from data privacy and model utility perspectives. We find that noise labels degrade the model’s generalization performance and enhance the ability of membership inference attacks on graph data privacy. To this end, we propose the robust graph neural network (RGLC) approach with noisy labeled graph classification. Specifically, we first accurately filter the noisy samples by high-confidence samples and the first feature principal component vector of each class. Then, the robust principal component vectors and the model output under data augmentation are utilized to achieve noise label correction guided by dual spatial information. Finally, supervised graph contrastive learning is introduced to enhance the embedding quality of the model and protect the privacy of the training graph data. The utility and privacy of the proposed method are validated by comparing twelve different methods on eight real graph classification datasets. Compared with the state-of-the-art methods, the RGLC method achieves at most and at least 7.8% and 0.8% performance gain at 30% noisy labeling rate, respectively, and reduces the accuracy of privacy attacks to below 60%.

查看原文本刊更多论文

反思图分类中噪声标签的影响：效用与隐私的视角

基于消息传递机制的图神经网络（GNN）在图分类任务中取得了先进的成果。然而，当训练数据中存在噪声标签时，它们的泛化性能就会下降。现有的噪声标签方法大多侧重于视觉领域或图节点分类任务，并且仅从效用角度分析噪声标签的影响。与现有工作不同的是，本文从数据隐私和模型效用的角度来衡量噪声标签对图分类的影响。我们发现，噪声标签会降低模型的泛化性能，并增强对图数据隐私的成员推断攻击能力。为此，我们提出了带噪声标签图分类的鲁棒图神经网络（RGLC）方法。具体来说，我们首先通过高置信度样本和每个类的第一个特征主成分向量精确过滤噪声样本。然后，利用鲁棒主成分向量和数据增强下的模型输出，在双重空间信息的引导下实现噪声标签校正。最后，引入有监督的图对比学习来提高模型的嵌入质量，并保护训练图数据的隐私。通过在八个真实图分类数据集上比较十二种不同的方法，验证了所提方法的实用性和隐私性。与最先进的方法相比，RGLC 方法在 30% 的噪声标记率下分别实现了最多和至少 7.8% 和 0.8% 的性能增益，并将隐私攻击的准确率降低到 60% 以下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.