Mining Big Data Using Parsimonious Factor and Shrinkage Methods

ERN: Other Econometrics: Applied Econometric Modeling in Macroeconomics (Topic) Pub Date : 2013-07-15 DOI:10.2139/ssrn.2294110

Hyun Hak Kim, Norman R. Swanson

{"title":"Mining Big Data Using Parsimonious Factor and Shrinkage Methods","authors":"Hyun Hak Kim, Norman R. Swanson","doi":"10.2139/ssrn.2294110","DOIUrl":null,"url":null,"abstract":"A number of recent studies in the economics literature have focused on the usefulness of factor models in the context of prediction using \"big data\". In this paper, our over-arching question is whether such \"big data\" are useful for modelling low frequency macroeconomic variables such as unemployment, inflation and GDP. In particular, we analyze the predictive benefits associated with the use dimension reducing independent component analysis (ICA) and sparse principal component analysis (SPCA), coupled with a variety of other factor estimation as well as data shrinkage methods, including bagging, boosting, and the elastic net, among others. We do so by carrying out a forecasting \"horse-race\", involving the estimation of 28 different baseline model types, each constructed using a variety of specification approaches, estimation approaches, and benchmark econometric models; and all used in the prediction of 11 key macroeconomic variables relevant for monetary policy assessment. In many instances, we find that various of our benchmark specifications, including autoregressive (AR) models, AR models with exogenous variables, and (Bayesian) model averaging, do not dominate more complicated nonlinear methods, and that using a combination of factor and other shrinkage methods often yields superior predictions. For example, simple averaging methods are mean square forecast error (MSFE) \"best\" in only 9 of 33 key cases considered. This is rather surprising new evidence that model averaging methods do not necessarily yield MSFE-best predictions. However, in order to \"beat\" model averaging methods, including arithmetic mean and Bayesian averaging approaches, we have introduced into our \"horse-race\" numerous complex new models involve combining complicated factor estimation methods with interesting new forms of shrinkage. For example, SPCA yields MSFE-best prediction models in many cases, particularly when coupled with shrinkage. This result provides strong new evidence of the usefulness of sophisticated factor based forecasting, and therefore, of the use of \"big data\" in macroeconometric forecasting.","PeriodicalId":443911,"journal":{"name":"ERN: Other Econometrics: Applied Econometric Modeling in Macroeconomics (Topic)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Other Econometrics: Applied Econometric Modeling in Macroeconomics (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.2294110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

A number of recent studies in the economics literature have focused on the usefulness of factor models in the context of prediction using "big data". In this paper, our over-arching question is whether such "big data" are useful for modelling low frequency macroeconomic variables such as unemployment, inflation and GDP. In particular, we analyze the predictive benefits associated with the use dimension reducing independent component analysis (ICA) and sparse principal component analysis (SPCA), coupled with a variety of other factor estimation as well as data shrinkage methods, including bagging, boosting, and the elastic net, among others. We do so by carrying out a forecasting "horse-race", involving the estimation of 28 different baseline model types, each constructed using a variety of specification approaches, estimation approaches, and benchmark econometric models; and all used in the prediction of 11 key macroeconomic variables relevant for monetary policy assessment. In many instances, we find that various of our benchmark specifications, including autoregressive (AR) models, AR models with exogenous variables, and (Bayesian) model averaging, do not dominate more complicated nonlinear methods, and that using a combination of factor and other shrinkage methods often yields superior predictions. For example, simple averaging methods are mean square forecast error (MSFE) "best" in only 9 of 33 key cases considered. This is rather surprising new evidence that model averaging methods do not necessarily yield MSFE-best predictions. However, in order to "beat" model averaging methods, including arithmetic mean and Bayesian averaging approaches, we have introduced into our "horse-race" numerous complex new models involve combining complicated factor estimation methods with interesting new forms of shrinkage. For example, SPCA yields MSFE-best prediction models in many cases, particularly when coupled with shrinkage. This result provides strong new evidence of the usefulness of sophisticated factor based forecasting, and therefore, of the use of "big data" in macroeconometric forecasting.

查看原文本刊更多论文

基于简约因子和收缩法的大数据挖掘

经济学文献中最近的一些研究集中在使用“大数据”进行预测的背景下，因素模型的有用性。在本文中，我们的首要问题是，这些“大数据”是否对失业率、通胀和GDP等低频宏观经济变量的建模有用。特别是，我们分析了与使用降维独立成分分析(ICA)和稀疏主成分分析(SPCA)相关的预测效益，以及各种其他因素估计和数据收缩方法，包括bagging, boosting和弹性网等。为此，我们进行了一场预测“赛马”，涉及28种不同基线模型类型的估计，每种模型都使用各种规格方法、估计方法和基准计量模型构建;并全部用于预测与货币政策评估相关的11个关键宏观经济变量。在许多情况下，我们发现我们的各种基准规范，包括自回归(AR)模型、带有外生变量的AR模型和(贝叶斯)模型平均，并不主导更复杂的非线性方法，并且使用因子和其他收缩方法的组合通常会产生更好的预测。例如，简单的平均方法是均方预测误差(MSFE)在考虑的33个关键案例中，只有9个是“最佳”。这是相当令人惊讶的新证据，表明模型平均方法不一定能产生最佳的msfe预测。然而，为了“击败”模型平均方法，包括算术平均值和贝叶斯平均方法，我们在我们的“赛马”中引入了许多复杂的新模型，包括将复杂的因子估计方法与有趣的新收缩形式相结合。例如，SPCA在许多情况下产生msfe最佳预测模型，特别是当与收缩相结合时。这一结果提供了强有力的新证据，证明了基于复杂因素的预测的有效性，以及在宏观计量经济学预测中使用“大数据”的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ERN: Other Econometrics: Applied Econometric Modeling in Macroeconomics (Topic)

自引率

0.00%

发文量