A. Senaratne, P. Christen, Graham J. Williams, Pouya Ghiasnezhad Omran
{"title":"Unsupervised Identification of Abnormal Nodes and Edges in Graphs","authors":"A. Senaratne, P. Christen, Graham J. Williams, Pouya Ghiasnezhad Omran","doi":"10.1145/3546912","DOIUrl":null,"url":null,"abstract":"Much of today’s data are represented as graphs, ranging from social networks to bibliographic citations. Nodes in such graphs correspond to records that generally represent entities, while edges represent relationships between these entities. Both nodes and edges in a graph can have attributes that characterize the entities and their relationships. Relationships are either explicitly known (like friends in a social network), or they are inferred using link prediction (such as two babies are siblings because they have the same mother). Any graph representing real-world data likely contains nodes and edges that are abnormal, and identifying these can be important for outlier detection in applications ranging from crime and fraud detection to viral marketing. We propose a novel approach to the unsupervised detection of abnormal nodes and edges in graphs. We first characterize nodes and edges using a set of features, and then employ a one-class classifier to identify abnormal nodes and edges. We extract patterns of features from these abnormal nodes and edges, and apply clustering to identify groups of patterns with similar characteristics. We finally visualize these abnormal patterns to show co-occurrences of features and relationships between those features that mostly influence the abnormality of nodes and edges. We evaluate our approach on datasets from diverse domains, including historical birth certificates, COVID patient records, e-mails, books, and movies. This evaluation demonstrates that our approach is well suited to identify both abnormal nodes and edges in graphs in an unsupervised way, and it can outperform several baseline anomaly detection techniques.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"42 1","pages":"1 - 37"},"PeriodicalIF":1.5000,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3546912","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 4
Abstract
Much of today’s data are represented as graphs, ranging from social networks to bibliographic citations. Nodes in such graphs correspond to records that generally represent entities, while edges represent relationships between these entities. Both nodes and edges in a graph can have attributes that characterize the entities and their relationships. Relationships are either explicitly known (like friends in a social network), or they are inferred using link prediction (such as two babies are siblings because they have the same mother). Any graph representing real-world data likely contains nodes and edges that are abnormal, and identifying these can be important for outlier detection in applications ranging from crime and fraud detection to viral marketing. We propose a novel approach to the unsupervised detection of abnormal nodes and edges in graphs. We first characterize nodes and edges using a set of features, and then employ a one-class classifier to identify abnormal nodes and edges. We extract patterns of features from these abnormal nodes and edges, and apply clustering to identify groups of patterns with similar characteristics. We finally visualize these abnormal patterns to show co-occurrences of features and relationships between those features that mostly influence the abnormality of nodes and edges. We evaluate our approach on datasets from diverse domains, including historical birth certificates, COVID patient records, e-mails, books, and movies. This evaluation demonstrates that our approach is well suited to identify both abnormal nodes and edges in graphs in an unsupervised way, and it can outperform several baseline anomaly detection techniques.