{"title":"袋化正则化$k$-距离异常检测","authors":"Yuchao Cai, Yuheng Ma, Hanfang Yang, Hanyuan Hang","doi":"arxiv-2312.01046","DOIUrl":null,"url":null,"abstract":"We consider the paradigm of unsupervised anomaly detection, which involves\nthe identification of anomalies within a dataset in the absence of labeled\nexamples. Though distance-based methods are top-performing for unsupervised\nanomaly detection, they suffer heavily from the sensitivity to the choice of\nthe number of the nearest neighbors. In this paper, we propose a new\ndistance-based algorithm called bagged regularized $k$-distances for anomaly\ndetection (BRDAD) converting the unsupervised anomaly detection problem into a\nconvex optimization problem. Our BRDAD algorithm selects the weights by\nminimizing the surrogate risk, i.e., the finite sample bound of the empirical\nrisk of the bagged weighted $k$-distances for density estimation (BWDDE). This\napproach enables us to successfully address the sensitivity challenge of the\nhyperparameter choice in distance-based algorithms. Moreover, when dealing with\nlarge-scale datasets, the efficiency issues can be addressed by the\nincorporated bagging technique in our BRDAD algorithm. On the theoretical side,\nwe establish fast convergence rates of the AUC regret of our algorithm and\ndemonstrate that the bagging technique significantly reduces the computational\ncomplexity. On the practical side, we conduct numerical experiments on anomaly\ndetection benchmarks to illustrate the insensitivity of parameter selection of\nour algorithm compared with other state-of-the-art distance-based methods.\nMoreover, promising improvements are brought by applying the bagging technique\nin our algorithm on real-world datasets.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"88 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bagged Regularized $k$-Distances for Anomaly Detection\",\"authors\":\"Yuchao Cai, Yuheng Ma, Hanfang Yang, Hanyuan Hang\",\"doi\":\"arxiv-2312.01046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the paradigm of unsupervised anomaly detection, which involves\\nthe identification of anomalies within a dataset in the absence of labeled\\nexamples. Though distance-based methods are top-performing for unsupervised\\nanomaly detection, they suffer heavily from the sensitivity to the choice of\\nthe number of the nearest neighbors. In this paper, we propose a new\\ndistance-based algorithm called bagged regularized $k$-distances for anomaly\\ndetection (BRDAD) converting the unsupervised anomaly detection problem into a\\nconvex optimization problem. Our BRDAD algorithm selects the weights by\\nminimizing the surrogate risk, i.e., the finite sample bound of the empirical\\nrisk of the bagged weighted $k$-distances for density estimation (BWDDE). This\\napproach enables us to successfully address the sensitivity challenge of the\\nhyperparameter choice in distance-based algorithms. Moreover, when dealing with\\nlarge-scale datasets, the efficiency issues can be addressed by the\\nincorporated bagging technique in our BRDAD algorithm. On the theoretical side,\\nwe establish fast convergence rates of the AUC regret of our algorithm and\\ndemonstrate that the bagging technique significantly reduces the computational\\ncomplexity. On the practical side, we conduct numerical experiments on anomaly\\ndetection benchmarks to illustrate the insensitivity of parameter selection of\\nour algorithm compared with other state-of-the-art distance-based methods.\\nMoreover, promising improvements are brought by applying the bagging technique\\nin our algorithm on real-world datasets.\",\"PeriodicalId\":501330,\"journal\":{\"name\":\"arXiv - MATH - Statistics Theory\",\"volume\":\"88 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Statistics Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2312.01046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.01046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Bagged Regularized $k$-Distances for Anomaly Detection
We consider the paradigm of unsupervised anomaly detection, which involves
the identification of anomalies within a dataset in the absence of labeled
examples. Though distance-based methods are top-performing for unsupervised
anomaly detection, they suffer heavily from the sensitivity to the choice of
the number of the nearest neighbors. In this paper, we propose a new
distance-based algorithm called bagged regularized $k$-distances for anomaly
detection (BRDAD) converting the unsupervised anomaly detection problem into a
convex optimization problem. Our BRDAD algorithm selects the weights by
minimizing the surrogate risk, i.e., the finite sample bound of the empirical
risk of the bagged weighted $k$-distances for density estimation (BWDDE). This
approach enables us to successfully address the sensitivity challenge of the
hyperparameter choice in distance-based algorithms. Moreover, when dealing with
large-scale datasets, the efficiency issues can be addressed by the
incorporated bagging technique in our BRDAD algorithm. On the theoretical side,
we establish fast convergence rates of the AUC regret of our algorithm and
demonstrate that the bagging technique significantly reduces the computational
complexity. On the practical side, we conduct numerical experiments on anomaly
detection benchmarks to illustrate the insensitivity of parameter selection of
our algorithm compared with other state-of-the-art distance-based methods.
Moreover, promising improvements are brought by applying the bagging technique
in our algorithm on real-world datasets.