Rethinking Unsupervised Graph Anomaly Detection With Deep Learning: Residuals and Objectives

IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xiaoxiao Ma;Fanzhen Liu;Jia Wu;Jian Yang;Shan Xue;Quan Z. Sheng
{"title":"Rethinking Unsupervised Graph Anomaly Detection With Deep Learning: Residuals and Objectives","authors":"Xiaoxiao Ma;Fanzhen Liu;Jia Wu;Jian Yang;Shan Xue;Quan Z. Sheng","doi":"10.1109/TKDE.2024.3501307","DOIUrl":null,"url":null,"abstract":"Anomalies often occur in real-world information networks/graphs, such as malevolent users in online review networks and fake news in social media. When representing such structured network data as graphs, anomalies usually appear as anomalous nodes that exhibit significantly deviated structure patterns, or different attributes, or the both. To date, numerous unsupervised methods have been developed to detect anomalies based on residual analysis, which assumes that anomalies will introduce larger residual errors (i.e., graph reconstruction loss). While these existing works achieved encouraging performance, in this paper, we formally prove that their employed learning objectives, i.e., MSE and cross-entropy losses, encounter significant limitations in learning the major data distributions, particularly for anomaly detection, and through our preliminary study, we reveal that the vanilla residual analysis-based methods cannot effectively investigate the rich graph structure. Upon these discoveries, we propose a novel structure-biased graph anomaly detection framework (SALAD) to attain anomalies’ divergent patterns with the assistance of a specially designed node representation augmentation approach. We further present two effective training objectives to empower SALAD to effectively capture the major structure and attribute distributions by emphasizing less on anomalies that introduce higher reconstruction errors under the encoder-decoder framework. The detection performance on eight widely-used datasets demonstrates SALAD's superiority over twelve state-of-the-art baselines. Additional ablation and case studies validate that our data augmentation method and training objectives result in the impressive performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"881-895"},"PeriodicalIF":8.9000,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10756792/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Anomalies often occur in real-world information networks/graphs, such as malevolent users in online review networks and fake news in social media. When representing such structured network data as graphs, anomalies usually appear as anomalous nodes that exhibit significantly deviated structure patterns, or different attributes, or the both. To date, numerous unsupervised methods have been developed to detect anomalies based on residual analysis, which assumes that anomalies will introduce larger residual errors (i.e., graph reconstruction loss). While these existing works achieved encouraging performance, in this paper, we formally prove that their employed learning objectives, i.e., MSE and cross-entropy losses, encounter significant limitations in learning the major data distributions, particularly for anomaly detection, and through our preliminary study, we reveal that the vanilla residual analysis-based methods cannot effectively investigate the rich graph structure. Upon these discoveries, we propose a novel structure-biased graph anomaly detection framework (SALAD) to attain anomalies’ divergent patterns with the assistance of a specially designed node representation augmentation approach. We further present two effective training objectives to empower SALAD to effectively capture the major structure and attribute distributions by emphasizing less on anomalies that introduce higher reconstruction errors under the encoder-decoder framework. The detection performance on eight widely-used datasets demonstrates SALAD's superiority over twelve state-of-the-art baselines. Additional ablation and case studies validate that our data augmentation method and training objectives result in the impressive performance.
用深度学习重新思考无监督图异常检测:残差和目标
在现实世界的信息网络/图中经常会出现异常,比如在线评论网络中的恶意用户,社交媒体中的假新闻。当将这种结构化网络数据表示为图时,异常通常表现为异常节点,这些节点表现出明显偏离的结构模式,或不同的属性,或两者兼而有之。迄今为止,已经开发了许多基于残差分析的无监督方法来检测异常,这些方法假设异常会引入更大的残差(即图重建损失)。虽然这些现有的工作取得了令人鼓舞的成绩,但在本文中,我们正式证明了他们所采用的学习目标,即MSE和交叉熵损失,在学习主要数据分布方面遇到了明显的局限性,特别是在异常检测方面,并且通过我们的初步研究,我们揭示了基于残差分析的传统方法不能有效地研究富图结构。基于这些发现,我们提出了一种新的结构偏置图异常检测框架(SALAD),通过特殊设计的节点表示增强方法来获得异常的发散模式。我们进一步提出了两个有效的训练目标,通过减少对在编码器-解码器框架下引入较高重建误差的异常的强调,使SALAD能够有效地捕获主要结构和属性分布。在8个广泛使用的数据集上的检测性能证明了SALAD优于12个最先进的基线。额外的消融和案例研究验证了我们的数据增强方法和培训目标产生了令人印象深刻的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering 工程技术-工程:电子与电气
CiteScore
11.70
自引率
3.40%
发文量
515
审稿时长
6 months
期刊介绍: The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信