Prediction and model evaluation for space-time data.

IF 1.2 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics Pub Date : 2023-09-03 eCollection Date: 2024-01-01 DOI:10.1080/02664763.2023.2252208

G L Watson, C E Reid, M Jerrett, D Telesca

{"title":"Prediction and model evaluation for space-time data.","authors":"G L Watson, C E Reid, M Jerrett, D Telesca","doi":"10.1080/02664763.2023.2252208","DOIUrl":null,"url":null,"abstract":"<p><p>Evaluation metrics for prediction error, model selection and model averaging on space-time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space-time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"1 1","pages":"2007-2024"},"PeriodicalIF":1.2000,"publicationDate":"2023-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11271132/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/02664763.2023.2252208","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Evaluation metrics for prediction error, model selection and model averaging on space-time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space-time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation.

查看原文本刊更多论文

时空数据预测与模型评价

预测误差、模型选择和时空数据模型平均的评价指标研究不足，理解不足。独立复制的缺失使得预测作为一个概念变得模糊，并且使得为独立数据开发的评估程序不适合大多数时空预测问题。受2008年加州野火期间收集的空气污染数据的启发，本文试图形式化与空间插值相关的真实预测误差。我们研究了各种交叉验证(CV)程序，采用模拟和案例研究，以深入了解替代数据分区策略所针对的估计的性质。与最近的最佳实践一致，我们发现基于位置的交叉验证适用于估计空间插值误差，就像我们对加州野火数据的分析一样。有趣的是，通常持有的CV折叠大小的偏差-方差权衡的概念并不适用于依赖数据，我们建议将留一个位置(LOLO) CV作为空间插值的首选预测误差度量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Applied Statistics 数学-统计学与概率论

CiteScore

3.40

自引率

0.00%

发文量

126

审稿时长

6 months

期刊介绍： Journal of Applied Statistics provides a forum for communication between both applied statisticians and users of applied statistical techniques across a wide range of disciplines. These areas include business, computing, economics, ecology, education, management, medicine, operational research and sociology, but papers from other areas are also considered. The editorial policy is to publish rigorous but clear and accessible papers on applied techniques. Purely theoretical papers are avoided but those on theoretical developments which clearly demonstrate significant applied potential are welcomed. Each paper is submitted to at least two independent referees.