大规模挖掘数据的传统克里格和现代高斯过程

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2022-07-20 DOI:10.1002/sam.11635

R. Christianson, R. Pollyea, R. Gramacy

{"title":"大规模挖掘数据的传统克里格和现代高斯过程","authors":"R. Christianson, R. Pollyea, R. Gramacy","doi":"10.1002/sam.11635","DOIUrl":null,"url":null,"abstract":"The canonical technique for nonlinear modeling of spatial/point‐referenced data is known as kriging in geostatistics, and as Gaussian Process (GP) regression for surrogate modeling and statistical learning. This article reviews many similarities shared between kriging and GPs, but also highlights some important differences. One is that GPs impose a process that can be used to automate kernel/variogram inference, thus removing the human from the loop. The GP framework also suggests a probabilistically valid means of scaling to handle a large corpus of training data, that is, an alternative to ordinary kriging. Finally, recent GP implementations are tailored to make the most of modern computing architectures, such as multi‐core workstations and multi‐node supercomputers. We argue that such distinctions are important even in classically geostatistical settings. To back that up, we present out‐of‐sample validation exercises using two, real, large‐scale borehole data sets acquired in the mining of gold and other minerals. We compare classic kriging with several variations of modern GPs and conclude that the latter is more economical (fewer human and compute resources), more accurate and offers better uncertainty quantification. We go on to show how the fully generative modeling apparatus provided by GPs can gracefully accommodate left‐censoring of small measurements, as commonly occurs in mining data and other borehole assays.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Traditional kriging versus modern Gaussian processes for large‐scale mining data\",\"authors\":\"R. Christianson, R. Pollyea, R. Gramacy\",\"doi\":\"10.1002/sam.11635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The canonical technique for nonlinear modeling of spatial/point‐referenced data is known as kriging in geostatistics, and as Gaussian Process (GP) regression for surrogate modeling and statistical learning. This article reviews many similarities shared between kriging and GPs, but also highlights some important differences. One is that GPs impose a process that can be used to automate kernel/variogram inference, thus removing the human from the loop. The GP framework also suggests a probabilistically valid means of scaling to handle a large corpus of training data, that is, an alternative to ordinary kriging. Finally, recent GP implementations are tailored to make the most of modern computing architectures, such as multi‐core workstations and multi‐node supercomputers. We argue that such distinctions are important even in classically geostatistical settings. To back that up, we present out‐of‐sample validation exercises using two, real, large‐scale borehole data sets acquired in the mining of gold and other minerals. We compare classic kriging with several variations of modern GPs and conclude that the latter is more economical (fewer human and compute resources), more accurate and offers better uncertainty quantification. We go on to show how the fully generative modeling apparatus provided by GPs can gracefully accommodate left‐censoring of small measurements, as commonly occurs in mining data and other borehole assays.\",\"PeriodicalId\":342679,\"journal\":{\"name\":\"Statistical Analysis and Data Mining: The ASA Data Science Journal\",\"volume\":\"80 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Analysis and Data Mining: The ASA Data Science Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/sam.11635\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

空间/点参考数据非线性建模的规范技术在地质统计学中被称为克里格，在代理建模和统计学习中被称为高斯过程(GP)回归。本文回顾了kriging和gp之间的许多相似之处，但也强调了一些重要的差异。一个是gp强加了一个过程，可以用来自动化核/变异函数推理，从而将人类从循环中移除。GP框架还提出了一种概率上有效的扩展方法来处理大量训练数据，即普通克里格的替代方法。最后，最近的GP实现是为了充分利用现代计算体系结构，如多核工作站和多节点超级计算机。我们认为，即使在经典的地质统计学背景下，这种区别也是重要的。为了支持这一点，我们提出了样本外验证练习，使用了在黄金和其他矿物开采中获得的两个真实的大型钻孔数据集。我们比较了经典克里格和现代gp的几种变体，并得出结论，后者更经济(更少的人力和计算资源)，更准确，并提供更好的不确定性量化。我们继续展示GPs提供的全生成建模设备如何能够优雅地适应小测量的左截，这通常发生在采矿数据和其他钻孔分析中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Traditional kriging versus modern Gaussian processes for large‐scale mining data

The canonical technique for nonlinear modeling of spatial/point‐referenced data is known as kriging in geostatistics, and as Gaussian Process (GP) regression for surrogate modeling and statistical learning. This article reviews many similarities shared between kriging and GPs, but also highlights some important differences. One is that GPs impose a process that can be used to automate kernel/variogram inference, thus removing the human from the loop. The GP framework also suggests a probabilistically valid means of scaling to handle a large corpus of training data, that is, an alternative to ordinary kriging. Finally, recent GP implementations are tailored to make the most of modern computing architectures, such as multi‐core workstations and multi‐node supercomputers. We argue that such distinctions are important even in classically geostatistical settings. To back that up, we present out‐of‐sample validation exercises using two, real, large‐scale borehole data sets acquired in the mining of gold and other minerals. We compare classic kriging with several variations of modern GPs and conclude that the latter is more economical (fewer human and compute resources), more accurate and offers better uncertainty quantification. We go on to show how the fully generative modeling apparatus provided by GPs can gracefully accommodate left‐censoring of small measurements, as commonly occurs in mining data and other borehole assays.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistical Analysis and Data Mining: The ASA Data Science Journal

自引率

0.00%

发文量