St'ephan Cl'emenccon, Hamid Jalalzai, St'ephane Lhaut, Anne Sabourin, J. Segers
{"title":"Concentration bounds for the empirical angular measure with statistical learning applications","authors":"St'ephan Cl'emenccon, Hamid Jalalzai, St'ephane Lhaut, Anne Sabourin, J. Segers","doi":"10.3150/22-bej1562","DOIUrl":null,"url":null,"abstract":"The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bernoulli","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.3150/22-bej1562","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 11
Abstract
The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.
期刊介绍:
BERNOULLI is the journal of the Bernoulli Society for Mathematical Statistics and Probability, issued four times per year. The journal provides a comprehensive account of important developments in the fields of statistics and probability, offering an international forum for both theoretical and applied work.
BERNOULLI will publish:
Papers containing original and significant research contributions: with background, mathematical derivation and discussion of the results in suitable detail and, where appropriate, with discussion of interesting applications in relation to the methodology proposed.
Papers of the following two types will also be considered for publication, provided they are judged to enhance the dissemination of research:
Review papers which provide an integrated critical survey of some area of probability and statistics and discuss important recent developments.
Scholarly written papers on some historical significant aspect of statistics and probability.