Statistical Depth Meets Machine Learning: Kernel Mean Embeddings and Depth in Functional Data Analysis

IF 1.8 3区数学 Q1 STATISTICS & PROBABILITY

International Statistical Review Pub Date : 2025-03-16 DOI:10.1111/insr.12611

George Wynne, Stanislav Nagy

{"title":"Statistical Depth Meets Machine Learning: Kernel Mean Embeddings and Depth in Functional Data Analysis","authors":"George Wynne, Stanislav Nagy","doi":"10.1111/insr.12611","DOIUrl":null,"url":null,"abstract":"<div>\n \n Statistical depth is the act of gauging how representative a point is compared with a reference probability measure. The depth allows introducing rankings and orderings to data living in multivariate, or function spaces. Though widely applied and with much experimental success, little theoretical progress has been made in analysing functional depths. This article highlights how the common \n<math>\n <mi>h</mi></math>-depth and related depths from functional data analysis can be viewed as a kernel mean embedding, widely used in statistical machine learning. This facilitates answers to several open questions regarding the statistical properties of functional depths. We show that (i) \n<math>\n <mi>h</mi></math>-depth has the interpretation of a kernel-based method; (ii) several \n<math>\n <mi>h</mi></math>-depths possess explicit expressions, without the need to estimate them using Monte Carlo procedures; (iii) under minimal assumptions, \n<math>\n <mi>h</mi></math>-depths and their maximisers are uniformly strongly consistent and asymptotically Gaussian (also in infinite-dimensional spaces and for imperfectly observed functional data); and (iv) several \n<math>\n <mi>h</mi></math>-depths uniquely characterise probability distributions in separable Hilbert spaces. In addition, we also provide a link between the depth and empirical characteristic function based procedures for functional data. Finally, the unveiled connections enable to design an extension of the \n<math>\n <mi>h</mi></math>-depth towards regression problems.\n </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"93 2","pages":"317-348"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Statistical Review","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/insr.12611","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Statistical depth is the act of gauging how representative a point is compared with a reference probability measure. The depth allows introducing rankings and orderings to data living in multivariate, or function spaces. Though widely applied and with much experimental success, little theoretical progress has been made in analysing functional depths. This article highlights how the common $h$ -depth and related depths from functional data analysis can be viewed as a kernel mean embedding, widely used in statistical machine learning. This facilitates answers to several open questions regarding the statistical properties of functional depths. We show that (i) $h$ -depth has the interpretation of a kernel-based method; (ii) several $h$ -depths possess explicit expressions, without the need to estimate them using Monte Carlo procedures; (iii) under minimal assumptions, $h$ -depths and their maximisers are uniformly strongly consistent and asymptotically Gaussian (also in infinite-dimensional spaces and for imperfectly observed functional data); and (iv) several $h$ -depths uniquely characterise probability distributions in separable Hilbert spaces. In addition, we also provide a link between the depth and empirical characteristic function based procedures for functional data. Finally, the unveiled connections enable to design an extension of the $h$ -depth towards regression problems.

查看原文本刊更多论文

统计深度满足机器学习：核均值嵌入和功能数据分析的深度

统计深度是衡量一个点与参考概率测量相比具有多大代表性的行为。深度允许对多变量空间或函数空间中的数据引入排名和排序。虽然在分析功能深度方面得到了广泛的应用和大量的实验成功，但在理论方面的进展却很少。本文强调了如何将函数数据分析中的常见h-depth和相关深度视为核均值嵌入，广泛用于统计机器学习。这有助于回答关于功能深度的统计特性的几个开放问题。我们表明(i) h-depth具有基于核的方法的解释；（ii）若干h-深度具有显式表达式，无需使用蒙特卡罗程序估计它们；（iii）在最小假设下，h-depth和它们的最大值是一致强一致和渐近高斯的（也适用于无限维空间和不完全观测的函数数据）；（iv）几个h深度唯一地表征了可分离希尔伯特空间中的概率分布。此外，我们还提供了深度和基于经验特征函数的功能数据程序之间的联系。最后，揭示的连接可以设计h-depth对回归问题的扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Statistical Review 数学-统计学与概率论

CiteScore

4.30

自引率

5.00%

发文量

审稿时长

>12 weeks

期刊介绍： International Statistical Review is the flagship journal of the International Statistical Institute (ISI) and of its family of Associations. It publishes papers of broad and general interest in statistics and probability. The term Review is to be interpreted broadly. The types of papers that are suitable for publication include (but are not limited to) the following: reviews/surveys of significant developments in theory, methodology, statistical computing and graphics, statistical education, and application areas; tutorials on important topics; expository papers on emerging areas of research or application; papers describing new developments and/or challenges in relevant areas; papers addressing foundational issues; papers on the history of statistics and probability; white papers on topics of importance to the profession or society; and historical assessment of seminal papers in the field and their impact.