{"title":"无监督异常检测中局部密度峰值的增密路径搜索","authors":"Jiachen Zhao;Fang Deng;Jiaqi Zhu;Jie Chen","doi":"10.1109/TBDATA.2023.3265509","DOIUrl":null,"url":null,"abstract":"Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 4","pages":"1198-1209"},"PeriodicalIF":7.5000,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly Detection\",\"authors\":\"Jiachen Zhao;Fang Deng;Jiaqi Zhu;Jie Chen\",\"doi\":\"10.1109/TBDATA.2023.3265509\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"9 4\",\"pages\":\"1198-1209\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2023-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10103526/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10103526/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly Detection
Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.