{"title":"Dimension-agnostic change point detection","authors":"Hanjia Gao , Runmin Wang , Xiaofeng Shao","doi":"10.1016/j.jeconom.2025.106012","DOIUrl":"10.1016/j.jeconom.2025.106012","url":null,"abstract":"<div><div>Change point testing for high-dimensional data has attracted a lot of attention in statistics, econometrics and machine learning owing to the emergence of high-dimensional data with structural breaks from many fields. In practice, when the dimension is less than the sample size but is not small, it is often unclear whether a method that is tailored to high-dimensional data or simply a classical method that is developed and justified for low-dimensional data is preferred. In addition, the methods designed for low-dimensional data may not work well in the high-dimensional environment and vice versa. In this paper, we propose a dimension-agnostic testing procedure targeting a single change point in the mean of a multivariate weakly dependent time series. Specifically, we can show that the limiting null distribution for our test statistic is the same regardless of the dimensionality and the magnitude of cross-sectional dependence. The power analysis is also conducted to understand the large sample behavior of the proposed test. Through Monte Carlo simulations and a real data illustration, we demonstrate that the finite sample results strongly corroborate the theory and suggest that the proposed test can be used as a benchmark for change-point detection of time series of low, medium, and high dimensions with complex cross-sectional and temporal dependence.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"250 ","pages":"Article 106012"},"PeriodicalIF":9.9,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143923088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pairwise valid instruments","authors":"Zhenting Sun , Kaspar Wüthrich","doi":"10.1016/j.jeconom.2025.106009","DOIUrl":"10.1016/j.jeconom.2025.106009","url":null,"abstract":"<div><div>Finding valid instruments is difficult. We propose Validity Set Instrumental Variable (VSIV) estimation, a method for estimating local average treatment effects (LATEs) in heterogeneous causal effect models when the instruments are partially invalid. We consider settings with pairwise valid instruments, that is, instruments that are valid for a subset of instrument value pairs. VSIV estimation exploits testable implications of instrument validity to remove invalid pairs and provides estimates of the LATEs for all remaining pairs, which can be aggregated into a single parameter of interest using researcher-specified weights. We show that the proposed VSIV estimators are asymptotically normal under weak conditions and remove or reduce the asymptotic bias relative to standard LATE estimators (that is, LATE estimators that do not use testable implications to remove invalid variation). We evaluate the finite sample properties of VSIV estimation in application-based simulations and apply our method to estimate the returns to college education using parental education as an instrument.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"250 ","pages":"Article 106009"},"PeriodicalIF":9.9,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143917270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inference on quantile processes with a finite number of clusters","authors":"Andreas Hagemann","doi":"10.1016/j.jeconom.2024.105672","DOIUrl":"10.1016/j.jeconom.2024.105672","url":null,"abstract":"<div><div>I introduce a generic method for inference on entire quantile<span> and regression quantile<span> processes in the presence of a finite number of large and arbitrarily heterogeneous clusters. The method asymptotically controls size by generating statistics that exhibit enough distributional symmetry such that randomization tests can be applied. The randomization test does not require ex-ante matching of clusters, is free of user-chosen parameters, and performs well at conventional significance levels with as few as five clusters. The method tests standard (non-sharp) hypotheses and can even be asymptotically similar in empirically relevant situations. The main focus of the paper is inference on quantile treatment effects but the method applies more broadly. Numerical and empirical examples are provided.</span></span></div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105672"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139677800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semiparametric approach to estimation of marginal mean effects and marginal quantile effects","authors":"Seong-ho Lee , Yanyuan Ma , Elvezio Ronchetti","doi":"10.1016/j.jeconom.2023.05.002","DOIUrl":"10.1016/j.jeconom.2023.05.002","url":null,"abstract":"<div><div>We consider a semiparametric generalized linear model and study estimation of both marginal mean effects and marginal quantile effects in this model. We propose an approximate maximum likelihood estimator, and rigorously establish the consistency, the asymptotic normality, and the semiparametric efficiency of our method in both the marginal mean effect and the marginal quantile effect estimation. Simulation studies are conducted to illustrate the finite sample performance, and we apply the new tool to analyze a Swiss non-labor income data and discover a new interesting predictor.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105455"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47877548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneous estimation and group identification for network vector autoregressive model with heterogeneous nodes","authors":"Xuening Zhu , Ganggang Xu , Jianqing Fan","doi":"10.1016/j.jeconom.2023.105564","DOIUrl":"10.1016/j.jeconom.2023.105564","url":null,"abstract":"<div><div>Individuals or companies in a large social or financial network often display rather heterogeneous behaviors for various reasons. In this work, we propose a network vector autoregressive model with a latent group structure to model heterogeneous dynamic patterns observed from network nodes, for which group-wise network effects and time-invariant fixed-effects can be naturally incorporated. In our framework, the model parameters and network node memberships can be simultaneously estimated by minimizing a least-squares type objective function. In particular, our theoretical investigation allows the number of latent groups <span><math><mi>G</mi></math></span> to be over-specified when achieving the estimation consistency of the model parameters and group memberships, which significantly improves the robustness of the proposed approach. When <span><math><mi>G</mi></math></span> is correctly specified, valid statistical inference can be made for model parameters based on the asymptotic normality of the estimators. A data-driven criterion is developed to consistently identify the true group number for practical use. Extensive simulation studies and two real data examples are used to demonstrate the effectiveness of the proposed methodology.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105564"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135455296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harold D. Chiang , Yukitoshi Matsushita , Taisuke Otsu
{"title":"Multiway empirical likelihood","authors":"Harold D. Chiang , Yukitoshi Matsushita , Taisuke Otsu","doi":"10.1016/j.jeconom.2024.105861","DOIUrl":"10.1016/j.jeconom.2024.105861","url":null,"abstract":"<div><div>This paper develops a general methodology to conduct statistical inference for observations indexed by multiple sets of entities. We propose a novel multiway empirical likelihood statistic that converges to a chi-square distribution under the non-degenerate case, where corresponding Hoeffding type decomposition is dominated by linear terms. Our methodology is related to the notion of jackknife empirical likelihood but the leave-out pseudo values are constructed by leaving out columns or rows. We further develop a modified version of our multiway empirical likelihood statistic, which converges to a chi-square distribution regardless of the degeneracy, and discuss its desirable higher-order property in a simplified setup. The proposed methodology is illustrated by several important econometric problems, such as bipartite network, generalized estimating equations, and three-way observations.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105861"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144071667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiawei Wen , Songshan Yang , Christina Dan Wang , Yifan Jiang , Runze Li
{"title":"Feature-splitting algorithms for ultrahigh dimensional quantile regression","authors":"Jiawei Wen , Songshan Yang , Christina Dan Wang , Yifan Jiang , Runze Li","doi":"10.1016/j.jeconom.2023.01.028","DOIUrl":"10.1016/j.jeconom.2023.01.028","url":null,"abstract":"<div><div>This paper is concerned with computational issues related to penalized quantile regression (PQR) with ultrahigh dimensional predictors. Various algorithms have been developed for PQR, but they become ineffective and/or infeasible in the presence of ultrahigh dimensional predictors due to the storage and scalability limitations. The variable updating schema of the feature-splitting algorithm that directly applies the ordinary alternating direction method of multiplier (ADMM) to ultrahigh dimensional PQR may make the algorithm fail to converge. To tackle this hurdle, we propose an efficient and parallelizable algorithm for ultrahigh dimensional PQR based on the three-block ADMM. The compatibility of the proposed algorithm with parallel computing alleviates the storage and scalability limitations of a single machine in the large-scale data processing. We establish the rate of convergence of the newly proposed algorithm. In addition, Monte Carlo simulations are conducted to compare the finite sample performance of the proposed algorithm with that of other existing algorithms. The numerical comparison implies that the proposed algorithm significantly outperforms the existing ones. We further illustrate the proposed algorithm via an empirical analysis of a real-world data set.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105426"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47340418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sokbae Lee , Yuan Liao , Myung Hwan Seo , Youngki Shin
{"title":"Fast inference for quantile regression with tens of millions of observations","authors":"Sokbae Lee , Yuan Liao , Myung Hwan Seo , Youngki Shin","doi":"10.1016/j.jeconom.2024.105673","DOIUrl":"10.1016/j.jeconom.2024.105673","url":null,"abstract":"<div><div><span>Big data analytics<span><span> has opened new avenues in economic research, but the challenge of analyzing datasets with tens of millions of observations is substantial. Conventional econometric methods based on extreme estimators require large amounts of computing resources and memory, which are often not readily available. In this paper, we focus on linear </span>quantile<span> regression applied to “ultra-large” datasets, such as U.S. decennial censuses. A fast inference framework is presented, utilizing stochastic subgradient descent (S-subGD) updates. The inference procedure handles cross-sectional data sequentially: (i) updating the parameter estimate with each incoming “new observation”, (ii) aggregating it as a </span></span></span><em>Polyak–Ruppert</em> average, and (iii) computing a pivotal statistic for inference using only a solution path. The methodology draws from time-series regression to create an asymptotically pivotal statistic through random scaling. Our proposed test statistic is calculated in a fully online fashion and critical values are calculated without resampling. We conduct extensive numerical studies to showcase the computational merits of our proposed inference. For inference problems as large as <span><math><mrow><mrow><mo>(</mo><mi>n</mi><mo>,</mo><mi>d</mi><mo>)</mo></mrow><mo>∼</mo><mrow><mo>(</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>7</mn></mrow></msup><mo>,</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>3</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span>, where <span><math><mi>n</mi></math></span><span> is the sample size and </span><span><math><mi>d</mi></math></span> is the number of regressors, our method generates new insights, surpassing current inference methods in computation. Our method specifically reveals trends in the gender gap in the U.S. college wage premium using millions of observations, while controlling over <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>3</mn></mrow></msup></mrow></math></span> covariates to mitigate confounding effects.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105673"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139758504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Battaglini , Luigi Guiso , Chiara Lacava , Douglas L. Miller , Eleonora Patacchini
{"title":"Refining public policies with machine learning: The case of tax auditing","authors":"Marco Battaglini , Luigi Guiso , Chiara Lacava , Douglas L. Miller , Eleonora Patacchini","doi":"10.1016/j.jeconom.2024.105847","DOIUrl":"10.1016/j.jeconom.2024.105847","url":null,"abstract":"<div><div>We study how machine learning techniques can be used to improve tax auditing efficiency using administrative data without the need of randomized audits. Using Italy’s population data on sole proprietorship tax returns and audits, our new approach addresses the challenge that predictions must be trained on human-selected data. There are substantial margins for raising revenue from audits by improving the selection of taxpayers to audit with machine learning. Replacing the 10% least promising audits with an equal number selected by our algorithm raises detected tax evasion by as much as 39%, and evasion that is actually paid back by 29%.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105847"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144068128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinhan Xie , Xiaodong Yan , Bei Jiang , Linglong Kong
{"title":"Statistical inference for smoothed quantile regression with streaming data","authors":"Jinhan Xie , Xiaodong Yan , Bei Jiang , Linglong Kong","doi":"10.1016/j.jeconom.2024.105924","DOIUrl":"10.1016/j.jeconom.2024.105924","url":null,"abstract":"<div><div>In this paper, we tackle the problem of conducting valid statistical inference for quantile regression with streaming data. The main difficulties are that the quantile regression loss function is non-smooth and it is often infeasible to store the entire dataset in memory, rendering traditional methodologies ineffective. We introduce a fully online updating method for statistical inference in smoothed quantile regression with streaming data to overcome these issues. Our main contributions are twofold. First, for low-dimensional data, we present an incremental updating algorithm to obtain the smoothed quantile regression estimator with the streaming data set. The proposed estimator allows us to construct asymptotically exact statistical inference procedures. Second, within the realm of high-dimensional data, we develop an online debiased lasso procedure to accommodate the special sparse structure of streaming data. The proposed online debiased approach is updated with only the current data and summary statistics of historical data and corrects an approximation error term from online updating with streaming data. Furthermore, theoretical results such as estimation consistency and asymptotic normality are established to justify its validity in both settings. Our findings are supported by simulation studies and illustrated through applications to Seoul’s bike-sharing demand data and index fund data.</div></div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"249 ","pages":"Article 105924"},"PeriodicalIF":9.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144071670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}