Using cross-validation methods to select time series models: Promises and pitfalls

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology Pub Date : 2023-12-07 DOI:10.1111/bmsp.12330

Siwei Liu, Di Jody Zhou

{"title":"Using cross-validation methods to select time series models: Promises and pitfalls","authors":"Siwei Liu, Di Jody Zhou","doi":"10.1111/bmsp.12330","DOIUrl":null,"url":null,"abstract":"<p>Vector autoregressive (VAR) modelling is widely employed in psychology for time series analyses of dynamic processes. However, the typically short time series in psychological studies can lead to overfitting of VAR models, impairing their predictive ability on unseen samples. Cross-validation (CV) methods are commonly recommended for assessing the predictive ability of statistical models. However, it is unclear how the performance of CV is affected by characteristics of time series data and the fitted models. In this simulation study, we examine the ability of two CV methods, namely,10-fold CV and blocked CV, in estimating the prediction errors of three time series models with increasing complexity (person-mean, AR, and VAR), and evaluate how their performance is affected by data characteristics. We then compare these CV methods to the traditional methods using the Akaike (AIC) and Bayesian (BIC) information criteria in their accuracy of selecting the most predictive models. We find that CV methods tend to underestimate prediction errors of simpler models, but overestimate prediction errors of VAR models, particularly when the number of observations is small. Nonetheless, CV methods, especially blocked CV, generally outperform the AIC and BIC. We conclude our study with a discussion on the implications of the findings and provide helpful guidelines for practice.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 2","pages":"337-355"},"PeriodicalIF":1.8000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12330","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Mathematical & Statistical Psychology","FirstCategoryId":"102","ListUrlMain":"https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bmsp.12330","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Vector autoregressive (VAR) modelling is widely employed in psychology for time series analyses of dynamic processes. However, the typically short time series in psychological studies can lead to overfitting of VAR models, impairing their predictive ability on unseen samples. Cross-validation (CV) methods are commonly recommended for assessing the predictive ability of statistical models. However, it is unclear how the performance of CV is affected by characteristics of time series data and the fitted models. In this simulation study, we examine the ability of two CV methods, namely,10-fold CV and blocked CV, in estimating the prediction errors of three time series models with increasing complexity (person-mean, AR, and VAR), and evaluate how their performance is affected by data characteristics. We then compare these CV methods to the traditional methods using the Akaike (AIC) and Bayesian (BIC) information criteria in their accuracy of selecting the most predictive models. We find that CV methods tend to underestimate prediction errors of simpler models, but overestimate prediction errors of VAR models, particularly when the number of observations is small. Nonetheless, CV methods, especially blocked CV, generally outperform the AIC and BIC. We conclude our study with a discussion on the implications of the findings and provide helpful guidelines for practice.

Abstract Image

查看原文本刊更多论文

使用交叉验证方法选择时间序列模型:承诺和缺陷。

向量自回归(VAR)模型在心理学中广泛应用于动态过程的时间序列分析。然而，在心理学研究中，典型的短时间序列会导致VAR模型的过拟合，损害其对未知样本的预测能力。交叉验证(CV)方法通常被推荐用于评估统计模型的预测能力。然而，目前尚不清楚时间序列数据和拟合模型的特征如何影响CV的性能。在这项模拟研究中，我们检验了两种CV方法，即10倍CV和阻塞CV，在估计三种复杂性时间序列模型(人平均、AR和VAR)的预测误差方面的能力，并评估了它们的性能如何受到数据特征的影响。然后，我们将这些CV方法与使用赤池(AIC)和贝叶斯(BIC)信息标准的传统方法在选择最具预测模型的准确性方面进行了比较。我们发现，CV方法往往低估了简单模型的预测误差，而高估了VAR模型的预测误差，特别是在观测数较少的情况下。尽管如此，CV方法，特别是阻塞CV方法，通常优于AIC和BIC方法。最后，我们对研究结果的意义进行了讨论，并为实践提供了有益的指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

British Journal of Mathematical & Statistical Psychology 医学-数学跨学科应用

CiteScore

5.00

自引率

3.80%

发文量

审稿时长

>12 weeks

期刊介绍： The British Journal of Mathematical and Statistical Psychology publishes articles relating to areas of psychology which have a greater mathematical or statistical aspect of their argument than is usually acceptable to other journals including: • mathematical psychology • statistics • psychometrics • decision making • psychophysics • classification • relevant areas of mathematics, computing and computer software These include articles that address substantitive psychological issues or that develop and extend techniques useful to psychologists. New models for psychological processes, new approaches to existing data, critiques of existing models and improved algorithms for estimating the parameters of a model are examples of articles which may be favoured.