极端区域交叉验证

IF 1.1 3区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
Anass Aghbalou, Patrice Bertail, François Portier, Anne Sabourin
{"title":"极端区域交叉验证","authors":"Anass Aghbalou, Patrice Bertail, François Portier, Anne Sabourin","doi":"10.1007/s10687-024-00495-z","DOIUrl":null,"url":null,"abstract":"<p>We conduct a non-asymptotic study of the Cross-Validation (CV) estimate of the generalization risk for learning algorithms dedicated to extreme regions of the covariates space. In this context which has recently been analysed from an Extreme Value Analysis perspective, the risk function measures the algorithm’s error given that the norm of the input exceeds a high quantile. The main challenge within this framework is the negligible size of the extreme training sample with respect to the full sample size and the necessity to re-scale the risk function by a probability tending to zero. We open the road to a finite sample understanding of CV for extreme values by establishing two new results: an exponential probability bound on the K-fold CV error and a polynomial probability bound on the leave-p-out CV. Our bounds are sharp in the sense that they match state-of-the-art guarantees for standard CV estimates while extending them to encompass a conditioning event of small probability. We illustrate the significance of our results regarding high dimensional classification in extreme regions via a Lasso-type logistic regression algorithm. The tightness of our bounds is investigated in numerical experiments.</p>","PeriodicalId":49274,"journal":{"name":"Extremes","volume":"22 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-validation on extreme regions\",\"authors\":\"Anass Aghbalou, Patrice Bertail, François Portier, Anne Sabourin\",\"doi\":\"10.1007/s10687-024-00495-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>We conduct a non-asymptotic study of the Cross-Validation (CV) estimate of the generalization risk for learning algorithms dedicated to extreme regions of the covariates space. In this context which has recently been analysed from an Extreme Value Analysis perspective, the risk function measures the algorithm’s error given that the norm of the input exceeds a high quantile. The main challenge within this framework is the negligible size of the extreme training sample with respect to the full sample size and the necessity to re-scale the risk function by a probability tending to zero. We open the road to a finite sample understanding of CV for extreme values by establishing two new results: an exponential probability bound on the K-fold CV error and a polynomial probability bound on the leave-p-out CV. Our bounds are sharp in the sense that they match state-of-the-art guarantees for standard CV estimates while extending them to encompass a conditioning event of small probability. We illustrate the significance of our results regarding high dimensional classification in extreme regions via a Lasso-type logistic regression algorithm. The tightness of our bounds is investigated in numerical experiments.</p>\",\"PeriodicalId\":49274,\"journal\":{\"name\":\"Extremes\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Extremes\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s10687-024-00495-z\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Extremes","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10687-024-00495-z","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

我们对专门用于协变量空间极端区域的学习算法的泛化风险的交叉验证(CV)估计进行了非渐近研究。在最近从极值分析角度进行分析的这一背景下,风险函数衡量的是输入的常模超过高量值时算法的误差。这一框架的主要挑战在于,相对于全部样本量而言,极端训练样本的大小可以忽略不计,因此必须以趋于零的概率对风险函数进行重新缩放。我们通过建立两个新结果:K 倍 CV 误差的指数概率约束和离散 CV 的多项式概率约束,开启了对极值 CV 的有限样本理解之路。我们的界值非常尖锐,与标准 CV 估计的最新保证相匹配,同时将它们扩展到包括小概率的条件事件。我们通过 Lasso 型逻辑回归算法说明了我们的结果对极端区域高维分类的意义。我们通过数值实验研究了我们的界限的严密性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Cross-validation on extreme regions

Cross-validation on extreme regions

We conduct a non-asymptotic study of the Cross-Validation (CV) estimate of the generalization risk for learning algorithms dedicated to extreme regions of the covariates space. In this context which has recently been analysed from an Extreme Value Analysis perspective, the risk function measures the algorithm’s error given that the norm of the input exceeds a high quantile. The main challenge within this framework is the negligible size of the extreme training sample with respect to the full sample size and the necessity to re-scale the risk function by a probability tending to zero. We open the road to a finite sample understanding of CV for extreme values by establishing two new results: an exponential probability bound on the K-fold CV error and a polynomial probability bound on the leave-p-out CV. Our bounds are sharp in the sense that they match state-of-the-art guarantees for standard CV estimates while extending them to encompass a conditioning event of small probability. We illustrate the significance of our results regarding high dimensional classification in extreme regions via a Lasso-type logistic regression algorithm. The tightness of our bounds is investigated in numerical experiments.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Extremes
Extremes MATHEMATICS, INTERDISCIPLINARY APPLICATIONS-STATISTICS & PROBABILITY
CiteScore
2.20
自引率
7.70%
发文量
15
审稿时长
>12 weeks
期刊介绍: Extremes publishes original research on all aspects of statistical extreme value theory and its applications in science, engineering, economics and other fields. Authoritative and timely reviews of theoretical advances and of extreme value methods and problems in important applied areas, including detailed case studies, are welcome and will be a regular feature. All papers are refereed. Publication will be swift: in particular electronic submission and correspondence is encouraged. Statistical extreme value methods encompass a very wide range of problems: Extreme waves, rainfall, and floods are of basic importance in oceanography and hydrology, as are high windspeeds and extreme temperatures in meteorology and catastrophic claims in insurance. The waveforms and extremes of random loads determine lifelengths in structural safety, corrosion and metal fatigue.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信