Jiawei Wen , Songshan Yang , Christina Dan Wang , Yifan Jiang , Runze Li
{"title":"Feature-splitting algorithms for ultrahigh dimensional quantile regression","authors":"Jiawei Wen , Songshan Yang , Christina Dan Wang , Yifan Jiang , Runze Li","doi":"10.1016/j.jeconom.2023.01.028","DOIUrl":"10.1016/j.jeconom.2023.01.028","url":null,"abstract":"<div><div>This paper is concerned with computational issues related to penalized quantile regression (PQR) with ultrahigh dimensional predictors. Various algorithms have been developed for PQR, but they become ineffective and/or infeasible in the presence of ultrahigh dimensional predictors due to the storage and scalability limitations. The variable updating schema of the feature-splitting algorithm that directly applies the ordinary alternating direction method of multiplier (ADMM) to ultrahigh dimensional PQR may make the algorithm fail to converge. To tackle this hurdle, we propose an efficient and parallelizable algorithm for ultrahigh dimensional PQR based on the three-block ADMM. The compatibility of the proposed algorithm with parallel computing alleviates the storage and scalability limitations of a single machine in the large-scale data processing. We establish the rate of convergence of the newly proposed algorithm. In addition, Monte Carlo simulations are conducted to compare the finite sample performance of the proposed algorithm with that of other existing algorithms. The numerical comparison implies that the proposed algorithm significantly outperforms the existing ones. We further illustrate the proposed algorithm via an empirical analysis of a real-world data set.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105426"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47340418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sokbae Lee , Yuan Liao , Myung Hwan Seo , Youngki Shin
{"title":"Fast inference for quantile regression with tens of millions of observations","authors":"Sokbae Lee , Yuan Liao , Myung Hwan Seo , Youngki Shin","doi":"10.1016/j.jeconom.2024.105673","DOIUrl":"10.1016/j.jeconom.2024.105673","url":null,"abstract":"<div><div><span>Big data analytics<span><span> has opened new avenues in economic research, but the challenge of analyzing datasets with tens of millions of observations is substantial. Conventional econometric methods based on extreme estimators require large amounts of computing resources and memory, which are often not readily available. In this paper, we focus on linear </span>quantile<span> regression applied to “ultra-large” datasets, such as U.S. decennial censuses. A fast inference framework is presented, utilizing stochastic subgradient descent (S-subGD) updates. The inference procedure handles cross-sectional data sequentially: (i) updating the parameter estimate with each incoming “new observation”, (ii) aggregating it as a </span></span></span><em>Polyak–Ruppert</em> average, and (iii) computing a pivotal statistic for inference using only a solution path. The methodology draws from time-series regression to create an asymptotically pivotal statistic through random scaling. Our proposed test statistic is calculated in a fully online fashion and critical values are calculated without resampling. We conduct extensive numerical studies to showcase the computational merits of our proposed inference. For inference problems as large as <span><math><mrow><mrow><mo>(</mo><mi>n</mi><mo>,</mo><mi>d</mi><mo>)</mo></mrow><mo>∼</mo><mrow><mo>(</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>7</mn></mrow></msup><mo>,</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>3</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span>, where <span><math><mi>n</mi></math></span><span> is the sample size and </span><span><math><mi>d</mi></math></span> is the number of regressors, our method generates new insights, surpassing current inference methods in computation. Our method specifically reveals trends in the gender gap in the U.S. college wage premium using millions of observations, while controlling over <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>3</mn></mrow></msup></mrow></math></span> covariates to mitigate confounding effects.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105673"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139758504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Battaglini , Luigi Guiso , Chiara Lacava , Douglas L. Miller , Eleonora Patacchini
{"title":"Refining public policies with machine learning: The case of tax auditing","authors":"Marco Battaglini , Luigi Guiso , Chiara Lacava , Douglas L. Miller , Eleonora Patacchini","doi":"10.1016/j.jeconom.2024.105847","DOIUrl":"10.1016/j.jeconom.2024.105847","url":null,"abstract":"<div><div>We study how machine learning techniques can be used to improve tax auditing efficiency using administrative data without the need of randomized audits. Using Italy’s population data on sole proprietorship tax returns and audits, our new approach addresses the challenge that predictions must be trained on human-selected data. There are substantial margins for raising revenue from audits by improving the selection of taxpayers to audit with machine learning. Replacing the 10% least promising audits with an equal number selected by our algorithm raises detected tax evasion by as much as 39%, and evasion that is actually paid back by 29%.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105847"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinhan Xie , Xiaodong Yan , Bei Jiang , Linglong Kong
{"title":"Statistical inference for smoothed quantile regression with streaming data","authors":"Jinhan Xie , Xiaodong Yan , Bei Jiang , Linglong Kong","doi":"10.1016/j.jeconom.2024.105924","DOIUrl":"10.1016/j.jeconom.2024.105924","url":null,"abstract":"<div><div>In this paper, we tackle the problem of conducting valid statistical inference for quantile regression with streaming data. The main difficulties are that the quantile regression loss function is non-smooth and it is often infeasible to store the entire dataset in memory, rendering traditional methodologies ineffective. We introduce a fully online updating method for statistical inference in smoothed quantile regression with streaming data to overcome these issues. Our main contributions are twofold. First, for low-dimensional data, we present an incremental updating algorithm to obtain the smoothed quantile regression estimator with the streaming data set. The proposed estimator allows us to construct asymptotically exact statistical inference procedures. Second, within the realm of high-dimensional data, we develop an online debiased lasso procedure to accommodate the special sparse structure of streaming data. The proposed online debiased approach is updated with only the current data and summary statistics of historical data and corrects an approximation error term from online updating with streaming data. Furthermore, theoretical results such as estimation consistency and asymptotic normality are established to justify its validity in both settings. Our findings are supported by simulation studies and illustrated through applications to Seoul’s bike-sharing demand data and index fund data.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105924"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144071670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maximilian Ahrens , Deniz Erdemlioglu , Michael McMahon , Christopher J. Neely , Xiye Yang
{"title":"Mind your language: Market responses to central bank speeches","authors":"Maximilian Ahrens , Deniz Erdemlioglu , Michael McMahon , Christopher J. Neely , Xiye Yang","doi":"10.1016/j.jeconom.2024.105921","DOIUrl":"10.1016/j.jeconom.2024.105921","url":null,"abstract":"<div><div>Central bank communication between meetings often moves markets, but researchers have traditionally paid less attention to it. Using a dataset of U.S. Federal Reserve speeches, we develop supervised multimodal natural language processing methods to identify how monetary policy news affect bond and stock market volatility and tail risk through implied changes in forecasts of GDP, inflation, and unemployment. We find that forecast revisions derived from FOMC-member speech can help explain volatility and tail risk in both equity and bond markets. Speeches from Chairs tend to produce larger forecast revisions and unconditionally raise volatility and tail risk, but their economic signals can <em>calm</em> markets (reduce volatility and tail risk). There is some evidence that a speaker’s monetary policy views may affect the impact of implied forecast revisions after conditioning on GDP growth.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105921"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monitoring multi-country macroeconomic risk: A quantile factor-augmented vector autoregressive (QFAVAR) approach","authors":"Dimitris Korobilis , Maximilian Schröder","doi":"10.1016/j.jeconom.2024.105730","DOIUrl":"10.1016/j.jeconom.2024.105730","url":null,"abstract":"<div><div>A multi-country quantile factor-augmented vector autoregression is proposed to model heterogeneities both across countries and across characteristics of the distributions of macroeconomic time series. The presence of quantile factors enables a parsimonious summary of these two heterogeneities by accounting for dependencies in the cross-sectional dimension as well as across different quantiles of macroeconomic data. Using monthly euro area data, the strong empirical performance of the new model in gauging the impact of global shocks on country-level macroeconomic risks is demonstrated. The short-term tail forecasts of QFAVAR outperform those of FAVARs with symmetric Gaussian errors as well as univariate and multivariate specifications featuring stochastic volatility. Modeling individual quantiles enables scenario analysis of macroeconomic risks, a unique feature absent in FAVARs with stochastic volatility or flexible error distributions.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105730"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre d'Aspremont , Simon Ben Arous , Jean-Charles Bricongne , Benjamin Lietti , Baptiste Meunier
{"title":"Satellites turn “concrete”: Tracking cement with satellite data and neural networks","authors":"Alexandre d'Aspremont , Simon Ben Arous , Jean-Charles Bricongne , Benjamin Lietti , Baptiste Meunier","doi":"10.1016/j.jeconom.2024.105923","DOIUrl":"10.1016/j.jeconom.2024.105923","url":null,"abstract":"<div><div>This paper exploits daily infrared images taken from satellites to track economic activity in advanced and emerging countries. We first develop a framework to read, clean, and exploit satellite images. Our algorithm uses the laws of physics (Planck's law) and machine learning to detect the heat produced by cement plants in activity. This allows us to monitor in real-time whether a cement plant is working. Using this on around 1,000 plants, we construct a satellite-based index. We show that using this satellite index outperforms benchmark models and alternative indicators for nowcasting the production of the cement industry as well as the activity in the construction sector. Comparing across methods, neural networks appear to yield more accurate predictions as they allow to exploit the granularity of our dataset. Overall, combining satellite images and machine learning can help policymakers to take informed and swift economic policy decisions by nowcasting accurately and in real-time economic activity.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105923"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Testing heterogeneous treatment effect with quantile regression under covariate-adaptive randomization","authors":"Yang Liu , Lucy Xia , Feifang Hu","doi":"10.1016/j.jeconom.2024.105808","DOIUrl":"10.1016/j.jeconom.2024.105808","url":null,"abstract":"<div><div><span><span><span>In economic studies and </span>clinical trials, it is prevalent to observe heterogeneous treatment effects that vary depending on the relative locations of units in the distribution of responses. In this study, we propose using </span>quantile regression to estimate and conduct inference for conditional quantile treatment effects (cQTEs) in covariate-adaptive randomized experiments. First, we present sufficient conditions for consistently estimating the cQTEs, concerning the bias due to omitting important covariates in the inference stage. Second, we derive the weak convergence of the quantile regression process and develop a covariate-adaptive randomized bootstrap (</span><span>CAR-BS</span>) for standard error estimation. Our theoretical results indicate that the Wald test adjusted by <span>CAR-BS</span> is valid in terms of the Type I error, for a large class of covariate-adaptive randomization procedures at different quantiles, regardless of the choice of covariates used in inference. We perform extensive numerical and empirical studies to demonstrate advantages of the new method in various settings.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105808"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141689735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How do firms’ financial conditions influence the transmission of monetary policy? A non-parametric local projection approach","authors":"Livia Paranhos","doi":"10.1016/j.jeconom.2024.105886","DOIUrl":"10.1016/j.jeconom.2024.105886","url":null,"abstract":"<div><div>How do monetary policy shocks affect firm investment? This paper provides new evidence on US non-financial firms and a novel non-parametric framework based on random forests. The key advantage of the methodology is that it does not impose any assumptions on how the effect of shocks varies across firms thereby allowing for general forms of heterogeneity in the transmission of shocks. My estimates suggest that there exists a threshold in the level of firm risk above which monetary policy is much less effective. Additionally, there is no evidence that the effect of policy varies with firm risk for the 75% of firms in the sample with higher risk. The proposed methodology is a generalization of local projections and nests several common local projection specifications, including linear and nonlinear.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105886"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Themed issue: Quantile regression and data heterogeneity","authors":"Xiaohong Chen, Xuming He","doi":"10.1016/j.jeconom.2024.105946","DOIUrl":"10.1016/j.jeconom.2024.105946","url":null,"abstract":"","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105946"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144071669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}