交叉验证中渐近最优带宽的显式解法

IF 2.4 2区 数学 Q2 BIOLOGY
Biometrika Pub Date : 2024-02-08 DOI:10.1093/biomet/asae007
Karim M Abadir, Michel Lubrano
{"title":"交叉验证中渐近最优带宽的显式解法","authors":"Karim M Abadir, Michel Lubrano","doi":"10.1093/biomet/asae007","DOIUrl":null,"url":null,"abstract":"Summary We show that least squares cross-validation methods share a common structure which has an explicit asymptotic solution, when the chosen kernel is asymptotically separable in bandwidth and data. For density estimation with a multivariate Student t(ν) kernel, the cross-validation criterion becomes asymptotically equivalent to a polynomial of only three terms. Our bandwidth formulae are simple and noniterative thus leading to very fast computations, their integrated squared-error dominates traditional cross-validation implementations, they alleviate the notorious sample variability of cross-validation, and overcome its breakdown in the case of repeated observations. We illustrate our method with univariate and bivariate applications, of density estimation and nonparametric regressions, to a large dataset of Michigan State University academic wages and experience.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Explicit solutions for the asymptotically-optimal bandwidth in cross-validation\",\"authors\":\"Karim M Abadir, Michel Lubrano\",\"doi\":\"10.1093/biomet/asae007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary We show that least squares cross-validation methods share a common structure which has an explicit asymptotic solution, when the chosen kernel is asymptotically separable in bandwidth and data. For density estimation with a multivariate Student t(ν) kernel, the cross-validation criterion becomes asymptotically equivalent to a polynomial of only three terms. Our bandwidth formulae are simple and noniterative thus leading to very fast computations, their integrated squared-error dominates traditional cross-validation implementations, they alleviate the notorious sample variability of cross-validation, and overcome its breakdown in the case of repeated observations. We illustrate our method with univariate and bivariate applications, of density estimation and nonparametric regressions, to a large dataset of Michigan State University academic wages and experience.\",\"PeriodicalId\":9001,\"journal\":{\"name\":\"Biometrika\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrika\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/biomet/asae007\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrika","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomet/asae007","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

摘要 我们证明了最小二乘交叉验证方法有一个共同的结构,当所选的核在带宽和数据上是渐进可分的时候,这个结构有一个明确的渐进解。对于使用多变量 Student t(ν) 核的密度估计,交叉验证准则在渐近上等价于一个只有三个项的多项式。我们的带宽计算公式简单且无需迭代,因此计算速度非常快,其综合平方误差在传统的交叉验证实现中占优势,缓解了交叉验证中众所周知的样本变异性,并克服了其在重复观测情况下的缺陷。我们在密歇根州立大学学术工资和经验的大型数据集上,用密度估计和非参数回归的单变量和双变量应用来说明我们的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Explicit solutions for the asymptotically-optimal bandwidth in cross-validation
Summary We show that least squares cross-validation methods share a common structure which has an explicit asymptotic solution, when the chosen kernel is asymptotically separable in bandwidth and data. For density estimation with a multivariate Student t(ν) kernel, the cross-validation criterion becomes asymptotically equivalent to a polynomial of only three terms. Our bandwidth formulae are simple and noniterative thus leading to very fast computations, their integrated squared-error dominates traditional cross-validation implementations, they alleviate the notorious sample variability of cross-validation, and overcome its breakdown in the case of repeated observations. We illustrate our method with univariate and bivariate applications, of density estimation and nonparametric regressions, to a large dataset of Michigan State University academic wages and experience.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biometrika
Biometrika 生物-生物学
CiteScore
5.50
自引率
3.70%
发文量
56
审稿时长
6-12 weeks
期刊介绍: Biometrika is primarily a journal of statistics in which emphasis is placed on papers containing original theoretical contributions of direct or potential value in applications. From time to time, papers in bordering fields are also published.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信