Using cross-validation methods to select time series models: Promises and pitfalls

IF 1.5 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
Siwei Liu, Di Jody Zhou
{"title":"Using cross-validation methods to select time series models: Promises and pitfalls","authors":"Siwei Liu,&nbsp;Di Jody Zhou","doi":"10.1111/bmsp.12330","DOIUrl":null,"url":null,"abstract":"<p>Vector autoregressive (VAR) modelling is widely employed in psychology for time series analyses of dynamic processes. However, the typically short time series in psychological studies can lead to overfitting of VAR models, impairing their predictive ability on unseen samples. Cross-validation (CV) methods are commonly recommended for assessing the predictive ability of statistical models. However, it is unclear how the performance of CV is affected by characteristics of time series data and the fitted models. In this simulation study, we examine the ability of two CV methods, namely,10-fold CV and blocked CV, in estimating the prediction errors of three time series models with increasing complexity (person-mean, AR, and VAR), and evaluate how their performance is affected by data characteristics. We then compare these CV methods to the traditional methods using the Akaike (AIC) and Bayesian (BIC) information criteria in their accuracy of selecting the most predictive models. We find that CV methods tend to underestimate prediction errors of simpler models, but overestimate prediction errors of VAR models, particularly when the number of observations is small. Nonetheless, CV methods, especially blocked CV, generally outperform the AIC and BIC. We conclude our study with a discussion on the implications of the findings and provide helpful guidelines for practice.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12330","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Mathematical & Statistical Psychology","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/bmsp.12330","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Vector autoregressive (VAR) modelling is widely employed in psychology for time series analyses of dynamic processes. However, the typically short time series in psychological studies can lead to overfitting of VAR models, impairing their predictive ability on unseen samples. Cross-validation (CV) methods are commonly recommended for assessing the predictive ability of statistical models. However, it is unclear how the performance of CV is affected by characteristics of time series data and the fitted models. In this simulation study, we examine the ability of two CV methods, namely,10-fold CV and blocked CV, in estimating the prediction errors of three time series models with increasing complexity (person-mean, AR, and VAR), and evaluate how their performance is affected by data characteristics. We then compare these CV methods to the traditional methods using the Akaike (AIC) and Bayesian (BIC) information criteria in their accuracy of selecting the most predictive models. We find that CV methods tend to underestimate prediction errors of simpler models, but overestimate prediction errors of VAR models, particularly when the number of observations is small. Nonetheless, CV methods, especially blocked CV, generally outperform the AIC and BIC. We conclude our study with a discussion on the implications of the findings and provide helpful guidelines for practice.

Abstract Image

使用交叉验证方法选择时间序列模型:承诺和缺陷。
向量自回归(VAR)模型在心理学中广泛应用于动态过程的时间序列分析。然而,在心理学研究中,典型的短时间序列会导致VAR模型的过拟合,损害其对未知样本的预测能力。交叉验证(CV)方法通常被推荐用于评估统计模型的预测能力。然而,目前尚不清楚时间序列数据和拟合模型的特征如何影响CV的性能。在这项模拟研究中,我们检验了两种CV方法,即10倍CV和阻塞CV,在估计三种复杂性时间序列模型(人平均、AR和VAR)的预测误差方面的能力,并评估了它们的性能如何受到数据特征的影响。然后,我们将这些CV方法与使用赤池(AIC)和贝叶斯(BIC)信息标准的传统方法在选择最具预测模型的准确性方面进行了比较。我们发现,CV方法往往低估了简单模型的预测误差,而高估了VAR模型的预测误差,特别是在观测数较少的情况下。尽管如此,CV方法,特别是阻塞CV方法,通常优于AIC和BIC方法。最后,我们对研究结果的意义进行了讨论,并为实践提供了有益的指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.00
自引率
3.80%
发文量
34
审稿时长
>12 weeks
期刊介绍: The British Journal of Mathematical and Statistical Psychology publishes articles relating to areas of psychology which have a greater mathematical or statistical aspect of their argument than is usually acceptable to other journals including: • mathematical psychology • statistics • psychometrics • decision making • psychophysics • classification • relevant areas of mathematics, computing and computer software These include articles that address substantitive psychological issues or that develop and extend techniques useful to psychologists. New models for psychological processes, new approaches to existing data, critiques of existing models and improved algorithms for estimating the parameters of a model are examples of articles which may be favoured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信