Prediction for Big Data Through Kriging: Small Sequential and One-Shot Designs

Q3 Business, Management and Accounting
J. Kleijnen, Wim C. M. van Beers
{"title":"Prediction for Big Data Through Kriging: Small Sequential and One-Shot Designs","authors":"J. Kleijnen, Wim C. M. van Beers","doi":"10.1080/01966324.2020.1716281","DOIUrl":null,"url":null,"abstract":"Abstract Kriging—or Gaussian process (GP) modeling—is an interpolation method assuming that the outputs (responses) are more correlated, as the inputs (explanatory or independent variables) are closer. Such a GP has unknown (hyper)parameters that are usually estimated through the maximum-likelihood method. Big data, however, make it problematic to compute these estimated parameters, and the corresponding Kriging predictor and its predictor variance. To solve this problem, some authors select a relatively small subset from the big set of previously observed “old” data. These selection methods are sequential, and they depend on the variance of the Kriging predictor; this variance requires a specific Kriging model and the estimation of its parameters. The resulting designs turn out to be “local”; i.e., most selected old input combinations are concentrated around the new combination to be predicted. We develop a simpler one-shot (fixed-sample, non-sequential) design; i.e., from the big data set we select a small subset with the nearest neighbors of the new combination. To compare our designs and the sequential designs empirically, we use the squared prediction errors, in several numerical experiments. These experiments show that our design may yield reasonable performance.","PeriodicalId":35850,"journal":{"name":"American Journal of Mathematical and Management Sciences","volume":"39 1","pages":"199 - 213"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01966324.2020.1716281","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Mathematical and Management Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/01966324.2020.1716281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Business, Management and Accounting","Score":null,"Total":0}
引用次数: 13

Abstract

Abstract Kriging—or Gaussian process (GP) modeling—is an interpolation method assuming that the outputs (responses) are more correlated, as the inputs (explanatory or independent variables) are closer. Such a GP has unknown (hyper)parameters that are usually estimated through the maximum-likelihood method. Big data, however, make it problematic to compute these estimated parameters, and the corresponding Kriging predictor and its predictor variance. To solve this problem, some authors select a relatively small subset from the big set of previously observed “old” data. These selection methods are sequential, and they depend on the variance of the Kriging predictor; this variance requires a specific Kriging model and the estimation of its parameters. The resulting designs turn out to be “local”; i.e., most selected old input combinations are concentrated around the new combination to be predicted. We develop a simpler one-shot (fixed-sample, non-sequential) design; i.e., from the big data set we select a small subset with the nearest neighbors of the new combination. To compare our designs and the sequential designs empirically, we use the squared prediction errors, in several numerical experiments. These experiments show that our design may yield reasonable performance.
通过克里格预测大数据:小序列和一次性设计
摘要克里格(Kriging)或高斯过程(GP)建模是一种插值方法,假设随着输入(解释变量或自变量)的接近,输出(响应)的相关性更强。这样的GP具有未知(超)参数,这些参数通常通过最大似然法来估计。然而,大数据使得计算这些估计参数以及相应的克里格预测器及其预测器方差成为问题。为了解决这个问题,一些作者从之前观察到的“旧”数据的大集合中选择了一个相对较小的子集。这些选择方法是顺序的,并且它们取决于克里格预测器的方差;这种方差需要特定的克里格模型及其参数的估计。由此产生的设计结果是“局部的”;即大多数选择的旧输入组合集中在要预测的新组合周围。我们开发了一种更简单的一次性(固定样本,非顺序)设计;即,从大数据集中,我们选择具有新组合的最近邻居的子集。为了从经验上比较我们的设计和顺序设计,我们在几个数值实验中使用了预测误差的平方。这些实验表明,我们的设计可能产生合理的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
American Journal of Mathematical and Management Sciences
American Journal of Mathematical and Management Sciences Business, Management and Accounting-Business, Management and Accounting (all)
CiteScore
2.70
自引率
0.00%
发文量
5
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信