Concentration bounds for the empirical angular measure with statistical learning applications

IF 1.7 2区数学 Q2 STATISTICS & PROBABILITY

Bernoulli Pub Date : 2021-04-07 DOI:10.3150/22-bej1562

St'ephan Cl'emenccon, Hamid Jalalzai, St'ephane Lhaut, Anne Sabourin, J. Segers

{"title":"Concentration bounds for the empirical angular measure with statistical learning applications","authors":"St'ephan Cl'emenccon, Hamid Jalalzai, St'ephane Lhaut, Anne Sabourin, J. Segers","doi":"10.3150/22-bej1562","DOIUrl":null,"url":null,"abstract":"The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bernoulli","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.3150/22-bej1562","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 11

Abstract

The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.

查看原文本刊更多论文

具有统计学习应用的经验角测度的集中界

单位球上的角测度表征了一个随机矢量在极端区域的分量的一阶依赖结构，用标准边距来定义。它的统计恢复是学习涉及远离中心的观察问题的重要一步。在向量的组成部分具有不同分布的常见情况下，秩变换提供了一种方便且稳健的标准化数据的方法，以便基于最极端的观测值构建角度度量的经验版本。然而，研究由此产生的经验角度测量的抽样分布是具有挑战性的。本文的目的是在控制组合复杂性的Borel集合的类上，为经验测度和真实测度之间的最大偏差统一地建立有限样本界。边界在高概率下是有效的，并且，直到对数因子，尺度为有效样本量的平方根。应用边界为两个统计学习过程提供性能保证，这些过程针对输入空间的极端区域，并建立在经验角度度量之上:通过经验风险最小化在极端区域进行二进制分类，以及通过球体的最小体积集进行无监督异常检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bernoulli 数学-统计学与概率论

CiteScore

3.40

自引率

0.00%

发文量

116

审稿时长

6-12 weeks

期刊介绍： BERNOULLI is the journal of the Bernoulli Society for Mathematical Statistics and Probability, issued four times per year. The journal provides a comprehensive account of important developments in the fields of statistics and probability, offering an international forum for both theoretical and applied work. BERNOULLI will publish: Papers containing original and significant research contributions: with background, mathematical derivation and discussion of the results in suitable detail and, where appropriate, with discussion of interesting applications in relation to the methodology proposed. Papers of the following two types will also be considered for publication, provided they are judged to enhance the dissemination of research: Review papers which provide an integrated critical survey of some area of probability and statistics and discuss important recent developments. Scholarly written papers on some historical significant aspect of statistics and probability.