{"title":"Sample splitting and assessing goodness-of-fit of time series.","authors":"Richard A Davis, Leon Fernandes","doi":"10.1093/biomet/asaf017","DOIUrl":null,"url":null,"abstract":"<p><p>A fundamental and often final step in time series modelling is to assess the quality of fit of a proposed model to the data. Since the underlying distribution of the innovations that generate a model is often not prescribed, goodness-of-fit tests typically take the form of testing the fitted residuals for serial independence. However, these fitted residuals are intrinsically dependent since they are based on the same parameter estimates, and thus standard tests of serial independence, such as those based on the autocorrelation function or auto-distance correlation function of the fitted residuals, need to be adjusted. The sample-splitting procedure of Pfister et al. (2018) is one such fix for the case of models for independent data, but fails to work in the dependent setting. In this article, sample splitting is leveraged in the time series setting to perform tests of serial dependence of fitted residuals using the autocorrelation function and auto-distance correlation function. The first [Formula: see text] of the data points are used to estimate the parameters of the model and then, using these parameter estimates, the last [Formula: see text] of the data points are used to compute the estimated residuals. Tests for serial independence are then based on these [Formula: see text] residuals. As long as the overlap between the [Formula: see text] and [Formula: see text] data splits is asymptotically [Formula: see text], the autocorrelation function and auto-distance correlation function tests of serial independence often have the same limit distributions as when the underlying residuals are indeed independent and identically distributed. In particular, if the first half of the data is used to estimate the parameters and the estimated residuals are computed for the entire dataset based on these parameter estimates, then the autocorrelation function and auto-distance correlation function can have the same limit distributions as if the residuals were independent and identically distributed. This procedure ameliorates the need for adjustment in the construction of confidence bounds for both the autocorrelation function and the auto-distance correlation function in goodness-of-fit testing.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 2","pages":"asaf017"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206451/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrika","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomet/asaf017","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
A fundamental and often final step in time series modelling is to assess the quality of fit of a proposed model to the data. Since the underlying distribution of the innovations that generate a model is often not prescribed, goodness-of-fit tests typically take the form of testing the fitted residuals for serial independence. However, these fitted residuals are intrinsically dependent since they are based on the same parameter estimates, and thus standard tests of serial independence, such as those based on the autocorrelation function or auto-distance correlation function of the fitted residuals, need to be adjusted. The sample-splitting procedure of Pfister et al. (2018) is one such fix for the case of models for independent data, but fails to work in the dependent setting. In this article, sample splitting is leveraged in the time series setting to perform tests of serial dependence of fitted residuals using the autocorrelation function and auto-distance correlation function. The first [Formula: see text] of the data points are used to estimate the parameters of the model and then, using these parameter estimates, the last [Formula: see text] of the data points are used to compute the estimated residuals. Tests for serial independence are then based on these [Formula: see text] residuals. As long as the overlap between the [Formula: see text] and [Formula: see text] data splits is asymptotically [Formula: see text], the autocorrelation function and auto-distance correlation function tests of serial independence often have the same limit distributions as when the underlying residuals are indeed independent and identically distributed. In particular, if the first half of the data is used to estimate the parameters and the estimated residuals are computed for the entire dataset based on these parameter estimates, then the autocorrelation function and auto-distance correlation function can have the same limit distributions as if the residuals were independent and identically distributed. This procedure ameliorates the need for adjustment in the construction of confidence bounds for both the autocorrelation function and the auto-distance correlation function in goodness-of-fit testing.
期刊介绍:
Biometrika is primarily a journal of statistics in which emphasis is placed on papers containing original theoretical contributions of direct or potential value in applications. From time to time, papers in bordering fields are also published.