{"title":"开放特征空间的在线异常点检测","authors":"Heng Lian;Yi He;Di Wu;Zhong Chen;Xingquan Zhu;Xindong Wu","doi":"10.1109/TKDE.2025.3593895","DOIUrl":null,"url":null,"abstract":"Outlier detection is essential for data compliance, fraud prevention, and strategic decision-making. Finding outliers relies on study of feature space to find anomalous instances. As the feature dimension increases, it will inevitably complicate the process and hinder the models from finding genuine outliers. In this paper, we investigate an ever-more challenging task, online outlier detection (OOD) problem, where data points to be examined for outlier detection are characterized by two dynamic changes: (1) increasing volume instead of a static set; and (2) evolving feature space instead of a known set. Such instance and feature space dynamics impedes traditional OD techniques reliant on geometric data structure for distinguishing outliers. To aid, we propose a new approach coined <italic>Online Outlier Detection in Open Feature Spaces</i>, which circumvents this limitation by learning a latent hypersphere representation, respectively positioning regular and anomalous data points inside and outside its boundary. The crux of our approach tailors a reconstruction loss, allowing each data point to be represented as an <italic>addition</i> of its pertinent feature embeddings. Each of these embeddings is updated non-intrusively, championing both efficient and incremental learning of the latent hypersphere. Extensive experiments on twelve benchmark datasets underscore the robustness and superior performance of our method against seven leading counterparts.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"6091-6106"},"PeriodicalIF":10.4000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online Outlier Detection in Open Feature Spaces\",\"authors\":\"Heng Lian;Yi He;Di Wu;Zhong Chen;Xingquan Zhu;Xindong Wu\",\"doi\":\"10.1109/TKDE.2025.3593895\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Outlier detection is essential for data compliance, fraud prevention, and strategic decision-making. Finding outliers relies on study of feature space to find anomalous instances. As the feature dimension increases, it will inevitably complicate the process and hinder the models from finding genuine outliers. In this paper, we investigate an ever-more challenging task, online outlier detection (OOD) problem, where data points to be examined for outlier detection are characterized by two dynamic changes: (1) increasing volume instead of a static set; and (2) evolving feature space instead of a known set. Such instance and feature space dynamics impedes traditional OD techniques reliant on geometric data structure for distinguishing outliers. To aid, we propose a new approach coined <italic>Online Outlier Detection in Open Feature Spaces</i>, which circumvents this limitation by learning a latent hypersphere representation, respectively positioning regular and anomalous data points inside and outside its boundary. The crux of our approach tailors a reconstruction loss, allowing each data point to be represented as an <italic>addition</i> of its pertinent feature embeddings. Each of these embeddings is updated non-intrusively, championing both efficient and incremental learning of the latent hypersphere. Extensive experiments on twelve benchmark datasets underscore the robustness and superior performance of our method against seven leading counterparts.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 10\",\"pages\":\"6091-6106\"},\"PeriodicalIF\":10.4000,\"publicationDate\":\"2025-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11117179/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11117179/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Outlier detection is essential for data compliance, fraud prevention, and strategic decision-making. Finding outliers relies on study of feature space to find anomalous instances. As the feature dimension increases, it will inevitably complicate the process and hinder the models from finding genuine outliers. In this paper, we investigate an ever-more challenging task, online outlier detection (OOD) problem, where data points to be examined for outlier detection are characterized by two dynamic changes: (1) increasing volume instead of a static set; and (2) evolving feature space instead of a known set. Such instance and feature space dynamics impedes traditional OD techniques reliant on geometric data structure for distinguishing outliers. To aid, we propose a new approach coined Online Outlier Detection in Open Feature Spaces, which circumvents this limitation by learning a latent hypersphere representation, respectively positioning regular and anomalous data points inside and outside its boundary. The crux of our approach tailors a reconstruction loss, allowing each data point to be represented as an addition of its pertinent feature embeddings. Each of these embeddings is updated non-intrusively, championing both efficient and incremental learning of the latent hypersphere. Extensive experiments on twelve benchmark datasets underscore the robustness and superior performance of our method against seven leading counterparts.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.