Adaptive Data Depth via Multi-Armed Bandits

J. Mach. Learn. Res. Pub Date : 2022-11-08 DOI:10.48550/arXiv.2211.03985

Tavor Z. Baharav, T. Lai

{"title":"Adaptive Data Depth via Multi-Armed Bandits","authors":"Tavor Z. Baharav, T. Lai","doi":"10.48550/arXiv.2211.03985","DOIUrl":null,"url":null,"abstract":"Data depth, introduced by Tukey (1975), is an important tool in data science, robust statistics, and computational geometry. One chief barrier to its broader practical utility is that many common measures of depth are computationally intensive, requiring on the order of $n^d$ operations to exactly compute the depth of a single point within a data set of $n$ points in $d$-dimensional space. Often however, we are not directly interested in the absolute depths of the points, but rather in their relative ordering. For example, we may want to find the most central point in a data set (a generalized median), or to identify and remove all outliers (points on the fringe of the data set with low depth). With this observation, we develop a novel and instance-adaptive algorithm for adaptive data depth computation by reducing the problem of exactly computing $n$ depths to an $n$-armed stochastic multi-armed bandit problem which we can efficiently solve. We focus our exposition on simplicial depth, developed by Liu (1990), which has emerged as a promising notion of depth due to its interpretability and asymptotic properties. We provide general instance-dependent theoretical guarantees for our proposed algorithms, which readily extend to many other common measures of data depth including majority depth, Oja depth, and likelihood depth. When specialized to the case where the gaps in the data follow a power law distribution with parameter $\\alpha<2$, we show that we can reduce the complexity of identifying the deepest point in the data set (the simplicial median) from $O(n^d)$ to $\\tilde{O}(n^{d-(d-1)\\alpha/2})$, where $\\tilde{O}$ suppresses logarithmic factors. We corroborate our theoretical results with numerical experiments on synthetic data, showing the practical utility of our proposed methods.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"11 1","pages":"155:1-155:29"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.03985","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Data depth, introduced by Tukey (1975), is an important tool in data science, robust statistics, and computational geometry. One chief barrier to its broader practical utility is that many common measures of depth are computationally intensive, requiring on the order of $n^d$ operations to exactly compute the depth of a single point within a data set of $n$ points in $d$-dimensional space. Often however, we are not directly interested in the absolute depths of the points, but rather in their relative ordering. For example, we may want to find the most central point in a data set (a generalized median), or to identify and remove all outliers (points on the fringe of the data set with low depth). With this observation, we develop a novel and instance-adaptive algorithm for adaptive data depth computation by reducing the problem of exactly computing $n$ depths to an $n$-armed stochastic multi-armed bandit problem which we can efficiently solve. We focus our exposition on simplicial depth, developed by Liu (1990), which has emerged as a promising notion of depth due to its interpretability and asymptotic properties. We provide general instance-dependent theoretical guarantees for our proposed algorithms, which readily extend to many other common measures of data depth including majority depth, Oja depth, and likelihood depth. When specialized to the case where the gaps in the data follow a power law distribution with parameter $\alpha<2$, we show that we can reduce the complexity of identifying the deepest point in the data set (the simplicial median) from $O(n^d)$ to $\tilde{O}(n^{d-(d-1)\alpha/2})$, where $\tilde{O}$ suppresses logarithmic factors. We corroborate our theoretical results with numerical experiments on synthetic data, showing the practical utility of our proposed methods.

查看原文本刊更多论文

基于多武装强盗的自适应数据深度

由Tukey(1975)引入的数据深度是数据科学、鲁棒统计和计算几何中的重要工具。其更广泛的实际应用的一个主要障碍是，许多常见的深度测量都是计算密集型的，需要在$d$维空间的$n$点的数据集中精确计算单个点的深度，其运算顺序为$n^d$。然而，我们通常并不直接对点的绝对深度感兴趣，而是对它们的相对顺序感兴趣。例如，我们可能想要在数据集中找到最中心的点(广义中位数)，或者识别并删除所有离群点(数据集边缘上的低深度点)。基于这一观察结果，我们开发了一种新颖的实例自适应自适应数据深度计算算法，将精确计算$n$深度的问题简化为一个$n$臂随机多臂强盗问题，我们可以有效地解决这个问题。我们将重点放在Liu(1990)开发的简单深度上，由于其可解释性和渐近性，它已成为一个有前途的深度概念。我们为我们提出的算法提供了一般的依赖实例的理论保证，这些算法很容易扩展到许多其他常见的数据深度度量，包括多数深度、Oja深度和似然深度。当专门研究数据中的间隙遵循参数$\alpha<2$的幂律分布的情况时，我们表明，我们可以降低识别数据集中最深点(简单中位数)从$O(n^d)$到$\tilde{O}(n^{d-(d-1)\alpha/2})$的复杂性，其中$\tilde{O}$抑制对数因子。我们用合成数据的数值实验证实了我们的理论结果，表明了我们提出的方法的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

J. Mach. Learn. Res.

自引率

0.00%

发文量