时空数据预测与模型评价

IF 1.2 4区 数学 Q2 STATISTICS & PROBABILITY
Journal of Applied Statistics Pub Date : 2023-09-03 eCollection Date: 2024-01-01 DOI:10.1080/02664763.2023.2252208
G L Watson, C E Reid, M Jerrett, D Telesca
{"title":"时空数据预测与模型评价","authors":"G L Watson, C E Reid, M Jerrett, D Telesca","doi":"10.1080/02664763.2023.2252208","DOIUrl":null,"url":null,"abstract":"<p><p>Evaluation metrics for prediction error, model selection and model averaging on space-time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space-time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"1 1","pages":"2007-2024"},"PeriodicalIF":1.2000,"publicationDate":"2023-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11271132/pdf/","citationCount":"0","resultStr":"{\"title\":\"Prediction and model evaluation for space-time data.\",\"authors\":\"G L Watson, C E Reid, M Jerrett, D Telesca\",\"doi\":\"10.1080/02664763.2023.2252208\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Evaluation metrics for prediction error, model selection and model averaging on space-time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space-time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation.</p>\",\"PeriodicalId\":15239,\"journal\":{\"name\":\"Journal of Applied Statistics\",\"volume\":\"1 1\",\"pages\":\"2007-2024\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11271132/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Applied Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1080/02664763.2023.2252208\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/02664763.2023.2252208","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

预测误差、模型选择和时空数据模型平均的评价指标研究不足,理解不足。独立复制的缺失使得预测作为一个概念变得模糊,并且使得为独立数据开发的评估程序不适合大多数时空预测问题。受2008年加州野火期间收集的空气污染数据的启发,本文试图形式化与空间插值相关的真实预测误差。我们研究了各种交叉验证(CV)程序,采用模拟和案例研究,以深入了解替代数据分区策略所针对的估计的性质。与最近的最佳实践一致,我们发现基于位置的交叉验证适用于估计空间插值误差,就像我们对加州野火数据的分析一样。有趣的是,通常持有的CV折叠大小的偏差-方差权衡的概念并不适用于依赖数据,我们建议将留一个位置(LOLO) CV作为空间插值的首选预测误差度量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Prediction and model evaluation for space-time data.

Evaluation metrics for prediction error, model selection and model averaging on space-time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space-time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Applied Statistics
Journal of Applied Statistics 数学-统计学与概率论
CiteScore
3.40
自引率
0.00%
发文量
126
审稿时长
6 months
期刊介绍: Journal of Applied Statistics provides a forum for communication between both applied statisticians and users of applied statistical techniques across a wide range of disciplines. These areas include business, computing, economics, ecology, education, management, medicine, operational research and sociology, but papers from other areas are also considered. The editorial policy is to publish rigorous but clear and accessible papers on applied techniques. Purely theoretical papers are avoided but those on theoretical developments which clearly demonstrate significant applied potential are welcomed. Each paper is submitted to at least two independent referees.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信