{"title":"Attribute grouping-based categorical outlier detection using causal coupling weight","authors":"Yijing Song, Jianying Liu, Jifu Zhang","doi":"10.1007/s40747-025-01869-x","DOIUrl":null,"url":null,"abstract":"<p>For high-dimensional datasets, outlier objects can be effectively identified and extracted with the help of the coupling relationship between any two attributes. However, when all the coupling is used directly, there is a phenomenon of pseudo-correlation between attribute values that results in redundant coupling and affects the effectiveness of high-dimensional outlier detection. In this paper, a novel attribute group-based outlier detection approach for categorical data is proposed by using the attribute causal coupling weights to depict abnormal degree of the attributes. Firstly, according to the local and global correlation, all attributes are automatically divided into several groups, and all attributes in each group have a high correlation or association. Secondly, new concepts of causal pseudo-correlation are defined, and a case analysis that the pseudo-correlation is the main cause of attribute redundant coupling. By constructing attribute causality graph using the graph structure, the pseudo-correlation is effectively avoided in each attribute group. Thirdly, attribute causal coupling weight formula, which effectively characterizes the abnormal degree of attribute and reflects the causal coupling between any two attributes, is constructed from the causality graph. An attribute group-based outlier detection algorithm powered by causal coupling weight is proposed for categorical data. In the end, experimental results on the UCI and synthetic datasets validate that the algorithm has good outlier detection performance and effectively alleviates the effect of redundant coupling among attributes. Importantly, compared with the competitive methods, the algorithm bolsters the AUC index and the detection efficiency by averages of 10.97 and 42.84<span>\\(\\%\\)</span>, respectively.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"7 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-025-01869-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
For high-dimensional datasets, outlier objects can be effectively identified and extracted with the help of the coupling relationship between any two attributes. However, when all the coupling is used directly, there is a phenomenon of pseudo-correlation between attribute values that results in redundant coupling and affects the effectiveness of high-dimensional outlier detection. In this paper, a novel attribute group-based outlier detection approach for categorical data is proposed by using the attribute causal coupling weights to depict abnormal degree of the attributes. Firstly, according to the local and global correlation, all attributes are automatically divided into several groups, and all attributes in each group have a high correlation or association. Secondly, new concepts of causal pseudo-correlation are defined, and a case analysis that the pseudo-correlation is the main cause of attribute redundant coupling. By constructing attribute causality graph using the graph structure, the pseudo-correlation is effectively avoided in each attribute group. Thirdly, attribute causal coupling weight formula, which effectively characterizes the abnormal degree of attribute and reflects the causal coupling between any two attributes, is constructed from the causality graph. An attribute group-based outlier detection algorithm powered by causal coupling weight is proposed for categorical data. In the end, experimental results on the UCI and synthetic datasets validate that the algorithm has good outlier detection performance and effectively alleviates the effect of redundant coupling among attributes. Importantly, compared with the competitive methods, the algorithm bolsters the AUC index and the detection efficiency by averages of 10.97 and 42.84\(\%\), respectively.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.