简单克里格的统计学习观点

IF 1.2 4区 数学 Q2 STATISTICS & PROBABILITY
Test Pub Date : 2023-11-21 DOI:10.1007/s11749-023-00891-w
Emilia Siviero, Emilie Chautru, Stephan Clémençon
{"title":"简单克里格的统计学习观点","authors":"Emilia Siviero, Emilie Chautru, Stephan Clémençon","doi":"10.1007/s11749-023-00891-w","DOIUrl":null,"url":null,"abstract":"<p>In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the <i>simple Kriging</i> task, the flagship problem in Geostatistics, from a statistical learning perspective, i.e., by carrying out a nonparametric finite-sample predictive analysis. Given <span>\\(d\\ge 1\\)</span> values taken by a realization of a square integrable random field <span>\\(X=\\{X_s\\}_{s\\in S}\\)</span>, <span>\\(S\\subset {\\mathbb {R}}^2\\)</span>, with unknown covariance structure, at sites <span>\\(s_1,\\; \\ldots ,\\; s_d\\)</span> in <i>S</i>, the goal is to predict the unknown values it takes at any other location <span>\\(s\\in S\\)</span> with minimum quadratic risk. The prediction rule being derived from a training spatial dataset: a single realization <span>\\(X'\\)</span> of <i>X</i>, is independent from those to be predicted, observed at <span>\\(n\\ge 1\\)</span> locations <span>\\(\\sigma _1,\\; \\ldots ,\\; \\sigma _n\\)</span> in <i>S</i>. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non-independent and identically distributed nature of the training data <span>\\(X'_{\\sigma _1},\\; \\ldots ,\\; X'_{\\sigma _n}\\)</span> involved in the learning procedure. In this article, non-asymptotic bounds of order <span>\\(O_{{\\mathbb {P}}}(1/\\sqrt{n})\\)</span> are proved for the excess risk of a <i>plug-in</i> predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes, observed at locations forming a regular grid in the learning stage. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments, on simulated data and on real-world datasets, and hopefully pave the way for further developments in statistical learning based on spatial data.\n</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"4 7","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A statistical learning view of simple Kriging\",\"authors\":\"Emilia Siviero, Emilie Chautru, Stephan Clémençon\",\"doi\":\"10.1007/s11749-023-00891-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the <i>simple Kriging</i> task, the flagship problem in Geostatistics, from a statistical learning perspective, i.e., by carrying out a nonparametric finite-sample predictive analysis. Given <span>\\\\(d\\\\ge 1\\\\)</span> values taken by a realization of a square integrable random field <span>\\\\(X=\\\\{X_s\\\\}_{s\\\\in S}\\\\)</span>, <span>\\\\(S\\\\subset {\\\\mathbb {R}}^2\\\\)</span>, with unknown covariance structure, at sites <span>\\\\(s_1,\\\\; \\\\ldots ,\\\\; s_d\\\\)</span> in <i>S</i>, the goal is to predict the unknown values it takes at any other location <span>\\\\(s\\\\in S\\\\)</span> with minimum quadratic risk. The prediction rule being derived from a training spatial dataset: a single realization <span>\\\\(X'\\\\)</span> of <i>X</i>, is independent from those to be predicted, observed at <span>\\\\(n\\\\ge 1\\\\)</span> locations <span>\\\\(\\\\sigma _1,\\\\; \\\\ldots ,\\\\; \\\\sigma _n\\\\)</span> in <i>S</i>. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non-independent and identically distributed nature of the training data <span>\\\\(X'_{\\\\sigma _1},\\\\; \\\\ldots ,\\\\; X'_{\\\\sigma _n}\\\\)</span> involved in the learning procedure. In this article, non-asymptotic bounds of order <span>\\\\(O_{{\\\\mathbb {P}}}(1/\\\\sqrt{n})\\\\)</span> are proved for the excess risk of a <i>plug-in</i> predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes, observed at locations forming a regular grid in the learning stage. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments, on simulated data and on real-world datasets, and hopefully pave the way for further developments in statistical learning based on spatial data.\\n</p>\",\"PeriodicalId\":51189,\"journal\":{\"name\":\"Test\",\"volume\":\"4 7\",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Test\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s11749-023-00891-w\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Test","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s11749-023-00891-w","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

在大数据时代,尤其是随着地理位置传感器的普及,呈现出可能复杂的空间依赖结构的海量数据集正变得越来越可用。在这种情况下,统计学习的标准概率理论并不直接适用,从这些数据中学习到的预测规则的泛化能力的保证还有待建立。在这里,我们从统计学习的角度,即通过进行非参数有限样本预测分析,分析简单的克里格任务,这是地质统计学中的旗舰问题。给定一个平方可积随机场\(X=\{X_s\}_{s\in S}\), \(S\subset {\mathbb {R}}^2\),具有未知协方差结构,在S中的\(s_1,\; \ldots ,\; s_d\)点的实现所取的\(d\ge 1\)值,目标是以最小的二次风险预测它在任何其他位置\(s\in S\)所取的未知值。从训练空间数据集导出的预测规则:X的单个实现\(X'\)独立于在s的\(n\ge 1\)位置\(\sigma _1,\; \ldots ,\; \sigma _n\)观察到的要预测的实现,尽管这个最小化问题与核脊回归有联系,但由于学习过程中涉及的训练数据\(X'_{\sigma _1},\; \ldots ,\; X'_{\sigma _n}\)的非独立和同分布性质,建立经验风险最小化的概化能力远非简单。在本文中,证明了在各向同性平稳高斯过程的情况下,在形成规则网格的位置观察到的模拟真正最小化的插件预测规则的超额风险的阶阶的非渐近界\(O_{{\mathbb {P}}}(1/\sqrt{n})\)。这些理论结果,以及建立它们所需的技术条件所起的作用,通过各种数值实验,模拟数据和现实世界的数据集来说明,并希望为基于空间数据的统计学习的进一步发展铺平道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

A statistical learning view of simple Kriging

A statistical learning view of simple Kriging

In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics, from a statistical learning perspective, i.e., by carrying out a nonparametric finite-sample predictive analysis. Given \(d\ge 1\) values taken by a realization of a square integrable random field \(X=\{X_s\}_{s\in S}\), \(S\subset {\mathbb {R}}^2\), with unknown covariance structure, at sites \(s_1,\; \ldots ,\; s_d\) in S, the goal is to predict the unknown values it takes at any other location \(s\in S\) with minimum quadratic risk. The prediction rule being derived from a training spatial dataset: a single realization \(X'\) of X, is independent from those to be predicted, observed at \(n\ge 1\) locations \(\sigma _1,\; \ldots ,\; \sigma _n\) in S. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non-independent and identically distributed nature of the training data \(X'_{\sigma _1},\; \ldots ,\; X'_{\sigma _n}\) involved in the learning procedure. In this article, non-asymptotic bounds of order \(O_{{\mathbb {P}}}(1/\sqrt{n})\) are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes, observed at locations forming a regular grid in the learning stage. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments, on simulated data and on real-world datasets, and hopefully pave the way for further developments in statistical learning based on spatial data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Test
Test 数学-统计学与概率论
CiteScore
2.20
自引率
7.70%
发文量
41
审稿时长
>12 weeks
期刊介绍: TEST is an international journal of Statistics and Probability, sponsored by the Spanish Society of Statistics and Operations Research. English is the official language of the journal. The emphasis of TEST is placed on papers containing original theoretical contributions of direct or potential value in applications. In this respect, the methodological contents are considered to be crucial for the papers published in TEST, but the practical implications of the methodological aspects are also relevant. Original sound manuscripts on either well-established or emerging areas in the scope of the journal are welcome. One volume is published annually in four issues. In addition to the regular contributions, each issue of TEST contains an invited paper from a world-wide recognized outstanding statistician on an up-to-date challenging topic, including discussions.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信