Tairan Huang, Yili Wang, Qiutong Li, Jianliang Gao
{"title":"面向图异常检测的多面一致性数据增强","authors":"Tairan Huang, Yili Wang, Qiutong Li, Jianliang Gao","doi":"10.1016/j.ipm.2025.104338","DOIUrl":null,"url":null,"abstract":"<div><div>Graph-based anomaly detection has become a prominent research area, driven by its critical applications in domains such as fraud detection, financial security, and biomedicine. However, existing methods encounter significant challenges, including label imbalance, feature camouflage, and limited supervision. In this paper, we propose McGAD, a method that incorporates two facets of consistency data augmentation: Structural Consistency Augmentation and Learnable Unsupervised Consistency Augmentation. Specifically, we use a heat wavelet diffusion pattern in structural consistency augmentation to capture the spectral graph wavelets of the nodes and treat the wavelets as probability distributions. McGAD uses an empirical characteristic function to convert wavelets into low-dimensional embeddings to indicate the neighborhood of each node. The nodes with high structural consistency of neighborhoods will have similar structural embeddings, even if they are far away from each other, which we provide a mathematical proof. This provides more effective embedding information for structurally consistent nodes of the same class, which can better solve label imbalance and feature camouflage problems. Moreover, we design a learnable unsupervised consistency augmentation module to handle the case of limited supervision. We make the whole augmentation process learnable, which enables the model to fully exploit the information from unlabeled nodes. We conduct extensive experiments on four benchmark datasets to demonstrate the superiority of McGAD. In particular, with only 1% training ratio, McGAD achieves 93.67% AUC and 91.81% F1-Macro on Amazon, outperforming 15 state-of-the-art baselines by up to 4.90% AUC and 3.75% F1-Macro.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104338"},"PeriodicalIF":6.9000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-faceted consistency data augmentation for graph anomaly detection\",\"authors\":\"Tairan Huang, Yili Wang, Qiutong Li, Jianliang Gao\",\"doi\":\"10.1016/j.ipm.2025.104338\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Graph-based anomaly detection has become a prominent research area, driven by its critical applications in domains such as fraud detection, financial security, and biomedicine. However, existing methods encounter significant challenges, including label imbalance, feature camouflage, and limited supervision. In this paper, we propose McGAD, a method that incorporates two facets of consistency data augmentation: Structural Consistency Augmentation and Learnable Unsupervised Consistency Augmentation. Specifically, we use a heat wavelet diffusion pattern in structural consistency augmentation to capture the spectral graph wavelets of the nodes and treat the wavelets as probability distributions. McGAD uses an empirical characteristic function to convert wavelets into low-dimensional embeddings to indicate the neighborhood of each node. The nodes with high structural consistency of neighborhoods will have similar structural embeddings, even if they are far away from each other, which we provide a mathematical proof. This provides more effective embedding information for structurally consistent nodes of the same class, which can better solve label imbalance and feature camouflage problems. Moreover, we design a learnable unsupervised consistency augmentation module to handle the case of limited supervision. We make the whole augmentation process learnable, which enables the model to fully exploit the information from unlabeled nodes. We conduct extensive experiments on four benchmark datasets to demonstrate the superiority of McGAD. In particular, with only 1% training ratio, McGAD achieves 93.67% AUC and 91.81% F1-Macro on Amazon, outperforming 15 state-of-the-art baselines by up to 4.90% AUC and 3.75% F1-Macro.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"63 1\",\"pages\":\"Article 104338\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325002791\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002791","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Multi-faceted consistency data augmentation for graph anomaly detection
Graph-based anomaly detection has become a prominent research area, driven by its critical applications in domains such as fraud detection, financial security, and biomedicine. However, existing methods encounter significant challenges, including label imbalance, feature camouflage, and limited supervision. In this paper, we propose McGAD, a method that incorporates two facets of consistency data augmentation: Structural Consistency Augmentation and Learnable Unsupervised Consistency Augmentation. Specifically, we use a heat wavelet diffusion pattern in structural consistency augmentation to capture the spectral graph wavelets of the nodes and treat the wavelets as probability distributions. McGAD uses an empirical characteristic function to convert wavelets into low-dimensional embeddings to indicate the neighborhood of each node. The nodes with high structural consistency of neighborhoods will have similar structural embeddings, even if they are far away from each other, which we provide a mathematical proof. This provides more effective embedding information for structurally consistent nodes of the same class, which can better solve label imbalance and feature camouflage problems. Moreover, we design a learnable unsupervised consistency augmentation module to handle the case of limited supervision. We make the whole augmentation process learnable, which enables the model to fully exploit the information from unlabeled nodes. We conduct extensive experiments on four benchmark datasets to demonstrate the superiority of McGAD. In particular, with only 1% training ratio, McGAD achieves 93.67% AUC and 91.81% F1-Macro on Amazon, outperforming 15 state-of-the-art baselines by up to 4.90% AUC and 3.75% F1-Macro.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.