Sample splitting and assessing goodness-of-fit of time series.

IF 2.8 2区数学 Q2 BIOLOGY

Biometrika Pub Date : 2025-03-05 eCollection Date: 2025-01-01 DOI:10.1093/biomet/asaf017

Richard A Davis, Leon Fernandes

{"title":"Sample splitting and assessing goodness-of-fit of time series.","authors":"Richard A Davis, Leon Fernandes","doi":"10.1093/biomet/asaf017","DOIUrl":null,"url":null,"abstract":"<p><p>A fundamental and often final step in time series modelling is to assess the quality of fit of a proposed model to the data. Since the underlying distribution of the innovations that generate a model is often not prescribed, goodness-of-fit tests typically take the form of testing the fitted residuals for serial independence. However, these fitted residuals are intrinsically dependent since they are based on the same parameter estimates, and thus standard tests of serial independence, such as those based on the autocorrelation function or auto-distance correlation function of the fitted residuals, need to be adjusted. The sample-splitting procedure of Pfister et al. (2018) is one such fix for the case of models for independent data, but fails to work in the dependent setting. In this article, sample splitting is leveraged in the time series setting to perform tests of serial dependence of fitted residuals using the autocorrelation function and auto-distance correlation function. The first [Formula: see text] of the data points are used to estimate the parameters of the model and then, using these parameter estimates, the last [Formula: see text] of the data points are used to compute the estimated residuals. Tests for serial independence are then based on these [Formula: see text] residuals. As long as the overlap between the [Formula: see text] and [Formula: see text] data splits is asymptotically [Formula: see text], the autocorrelation function and auto-distance correlation function tests of serial independence often have the same limit distributions as when the underlying residuals are indeed independent and identically distributed. In particular, if the first half of the data is used to estimate the parameters and the estimated residuals are computed for the entire dataset based on these parameter estimates, then the autocorrelation function and auto-distance correlation function can have the same limit distributions as if the residuals were independent and identically distributed. This procedure ameliorates the need for adjustment in the construction of confidence bounds for both the autocorrelation function and the auto-distance correlation function in goodness-of-fit testing.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 2","pages":"asaf017"},"PeriodicalIF":2.8000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206451/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrika","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomet/asaf017","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

A fundamental and often final step in time series modelling is to assess the quality of fit of a proposed model to the data. Since the underlying distribution of the innovations that generate a model is often not prescribed, goodness-of-fit tests typically take the form of testing the fitted residuals for serial independence. However, these fitted residuals are intrinsically dependent since they are based on the same parameter estimates, and thus standard tests of serial independence, such as those based on the autocorrelation function or auto-distance correlation function of the fitted residuals, need to be adjusted. The sample-splitting procedure of Pfister et al. (2018) is one such fix for the case of models for independent data, but fails to work in the dependent setting. In this article, sample splitting is leveraged in the time series setting to perform tests of serial dependence of fitted residuals using the autocorrelation function and auto-distance correlation function. The first [Formula: see text] of the data points are used to estimate the parameters of the model and then, using these parameter estimates, the last [Formula: see text] of the data points are used to compute the estimated residuals. Tests for serial independence are then based on these [Formula: see text] residuals. As long as the overlap between the [Formula: see text] and [Formula: see text] data splits is asymptotically [Formula: see text], the autocorrelation function and auto-distance correlation function tests of serial independence often have the same limit distributions as when the underlying residuals are indeed independent and identically distributed. In particular, if the first half of the data is used to estimate the parameters and the estimated residuals are computed for the entire dataset based on these parameter estimates, then the autocorrelation function and auto-distance correlation function can have the same limit distributions as if the residuals were independent and identically distributed. This procedure ameliorates the need for adjustment in the construction of confidence bounds for both the autocorrelation function and the auto-distance correlation function in goodness-of-fit testing.

查看原文本刊更多论文

样本分割与时间序列拟合优度评估。

时间序列建模的一个基本步骤，往往是最后一步，是评估所提出的模型与数据的拟合质量。由于生成模型的创新的潜在分布通常没有规定，所以拟合优度检验通常采用检验序列独立性的拟合残差的形式。然而，这些拟合残差本质上是相关的，因为它们是基于相同的参数估计，因此需要调整序列独立性的标准检验，例如基于拟合残差的自相关函数或自距离相关函数的检验。Pfister等人（2018）的样本分割过程是针对独立数据模型的一种修复方法，但在依赖设置中不起作用。在本文中，在时间序列设置中利用样本分裂，使用自相关函数和自距离相关函数对拟合残差的序列依赖性进行检验。使用数据点的第一个[公式：见文]来估计模型的参数，然后使用这些参数估计，使用数据点的最后一个[公式：见文]来计算估计的残差。然后根据这些[公式：见文本]残差对序列独立性进行检验。只要[公式：见文]和[公式：见文]数据分割之间的重叠是渐近的[公式：见文]，序列独立性的自相关函数和自距离相关函数检验的极限分布往往与底层残差确实独立且同分布时的极限分布相同。特别是，如果使用数据的前半部分来估计参数，并根据这些参数估计计算整个数据集的估计残差，那么自相关函数和自距离相关函数可以具有相同的极限分布，就好像残差是独立且同分布的一样。该方法减少了在拟合优度检验中自相关函数和自距离相关函数的置信限构造中需要调整的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biometrika 生物-生物学

CiteScore

5.50

自引率

3.70%

发文量

审稿时长

6-12 weeks

期刊介绍： Biometrika is primarily a journal of statistics in which emphasis is placed on papers containing original theoretical contributions of direct or potential value in applications. From time to time, papers in bordering fields are also published.