{"title":"A Frisch-Waugh-Lovell theorem for empirical likelihood","authors":"Yichun Song","doi":"10.1016/j.csda.2025.108208","DOIUrl":"10.1016/j.csda.2025.108208","url":null,"abstract":"<div><div>A Frisch-Waugh-Lovell-type (FWL) theorem for empirical likelihood estimation with instrumental variables is presented, which resembles the standard FWL theorem in ordinary least squares (OLS), but its partitioning procedure employs the empirical likelihood weights at the solution rather than the original sample distribution. This result is leveraged to simplify the computational process through an iterative algorithm, where exogenous variables are partitioned out using weighted least squares, and the weights are updated between iterations. Furthermore, it is demonstrated that iterations converge locally to the original empirical likelihood estimate at a stochastically super-linear rate. A feasible iterative constrained optimization algorithm for calculating empirical-likelihood-based confidence intervals is provided, along with a discussion of its properties. Monte Carlo simulations indicate that the iterative algorithm is robust and produces results within the numerical tolerance of the original empirical likelihood estimator in finite samples, while significantly improves computation in large-scale problems. Additionally, the algorithm performs effectively in an illustrative application using the return to education framework.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108208"},"PeriodicalIF":1.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144137907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heavy-tailed matrix-variate hidden Markov models","authors":"Salvatore D. Tomarchio","doi":"10.1016/j.csda.2025.108198","DOIUrl":"10.1016/j.csda.2025.108198","url":null,"abstract":"<div><div>The matrix-variate framework for hidden Markov models (HMMs) is expanded with two families of models using matrix-variate <em>t</em> and contaminated normal distributions. These models improve the handling of tail behavior, clustering, and address challenges in identifying outlying matrices in matrix-variate data. Two Expectation-Conditional Maximization (ECM) algorithms are implemented in the R package <strong>MatrixHMM</strong> for parameter estimation. Simulations assess parameter recovery, robustness, anomaly detection, and show the advantages over alternative approaches. The models are applied to real-world data to analyze labor market dynamics across Italian provinces.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108198"},"PeriodicalIF":1.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exact statistical analysis for response-adaptive clinical trials: A general and computationally tractable approach","authors":"Stef Baas , Peter Jacko , Sofía S. Villar","doi":"10.1016/j.csda.2025.108207","DOIUrl":"10.1016/j.csda.2025.108207","url":null,"abstract":"<div><div>Response-adaptive clinical trial designs allow targeting a given objective by skewing the allocation of participants to treatments based on observed outcomes. Response-adaptive designs face greater regulatory scrutiny due to potential type I error rate inflation, which limits their uptake in practice. Existing approaches for type I error control either only work for specific designs, have a risk of Monte Carlo/approximation error, are conservative, or computationally intractable. To this end, a general and computationally tractable approach is developed for exact analysis in two-arm response-adaptive designs with binary outcomes. This approach can construct exact tests for designs using either a randomized or deterministic response-adaptive procedure. The constructed conditional and unconditional exact tests generalize Fisher's and Barnard's exact tests, respectively. Furthermore, the approach allows for complexities such as delayed outcomes, early stopping, or allocation of participants in blocks. The efficient implementation of forward recursion allows for testing of two-arm trials with 1,000 participants on a standard computer. Through an illustrative computational study of trials using randomized dynamic programming it is shown that, contrary to what is known for equal allocation, the conditional exact Wald test based on total successes has, almost uniformly, higher power than the unconditional exact Wald test. Two real-world trials with the above-mentioned complexities are re-analyzed to demonstrate the value of the new approach in controlling type I errors and/or improving the statistical power.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108207"},"PeriodicalIF":1.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144099882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed variable screening for generalized linear models","authors":"Tianbo Diao , Bo Li , Lianqiang Qu , Liuquan Sun","doi":"10.1016/j.csda.2025.108203","DOIUrl":"10.1016/j.csda.2025.108203","url":null,"abstract":"<div><div>In this article, we develop a distributed variable screening method for generalized linear models. This method is designed to handle situations where both the sample size and the number of covariates are large. Specifically, the proposed method selects relevant covariates by using a sparsity-restricted surrogate likelihood estimator. It takes into account the joint effects of the covariates rather than just the marginal effect, and this characteristic enhances the reliability of the screening results. We establish the sure screening property of the proposed method, which ensures that with a high probability, the true model is included in the selected model. Simulation studies are conducted to evaluate the finite sample performance of the proposed method, and an application to a real dataset showcases its practical utility.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108203"},"PeriodicalIF":1.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A simultaneous confidence-bounded true discovery proportion perspective on localizing differences in smooth terms in regression models","authors":"David Swanson","doi":"10.1016/j.csda.2025.108197","DOIUrl":"10.1016/j.csda.2025.108197","url":null,"abstract":"<div><div>A method is demonstrated for localizing where two spline terms, or smooths, differ using a true discovery proportion (TDP)-based interpretation. The procedure yields a statement on the proportion of some region where true differences exist between two smooths. The methodology avoids ad hoc approaches to making such statements, like subsetting the data and performing hypothesis tests on the truncated spline terms. TDP estimates are 1-<em>α</em> confidence-bounded simultaneously, which means that a region's TDP estimate is a lower bound on the proportion of actual differences, or true discoveries, in that region, with high confidence regardless of the number of estimates made. The procedure is based on closed-testing using Simes local test. This local test requires that the multivariate <span><math><msup><mrow><mi>χ</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> test statistics of generalized Wishart type underlying the method be positive regression dependent on subsets (PRDS), a result for which evidence is presented suggesting that the condition holds. Consistency of the procedure is demonstrated for generalized additive models with the tuning parameter chosen by REML or GCV, and the achievement of confidence-bounded TDP is shown in simulation as is an analysis of walking gait.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108197"},"PeriodicalIF":1.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143906892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yichen Lou , Yuqing Ma , Liming Xiang , Jianguo Sun
{"title":"Flexible modeling of left-truncated and interval-censored competing risks data with missing event types","authors":"Yichen Lou , Yuqing Ma , Liming Xiang , Jianguo Sun","doi":"10.1016/j.csda.2025.108229","DOIUrl":"10.1016/j.csda.2025.108229","url":null,"abstract":"<div><div>Interval-censored competing risks data arise in many cohort studies in clinical research, where multiple types of events subject to interval censoring are included and the occurrence of the primary event of interest may be censored by the occurrence of other events. The presence of missing event types and left truncation poses challenges to the regression analysis of such data. We propose a new two-stage estimation procedure under a class of semiparametric generalized odds rate transformation models to overcome these challenges. Our method first facilitates the estimation of both the probability of response and the probability of occurrence of each type of event under the missing at random assumption, using either parametric or non-parametric methods. An augmented inverse probability weighting likelihood based on the complete-case likelihood and data from subjects with missing type of event is then maximized for estimating regression parameters. We provide desirable asymptotic properties and construct a concordance index to evaluate the model's discriminative ability. The proposed method is demonstrated through extensive simulations and the analysis of data from the Amsterdam cohort study on HIV infection and AIDS.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108229"},"PeriodicalIF":1.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144242893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Small area prediction of counts under machine learning-type mixed models","authors":"Nicolas Frink, Timo Schmid","doi":"10.1016/j.csda.2025.108218","DOIUrl":"10.1016/j.csda.2025.108218","url":null,"abstract":"<div><div>Small area estimation methods are proposed that use generalized tree-based machine learning techniques to improve the estimation of disaggregated means in small areas using discrete survey data. Specifically, two existing approaches based on random forests - the Generalized Mixed Effects Random Forest (GMERF) and a Mixed Effects Random Forest (MERF) - are extended to accommodate count outcomes, addressing key challenges such as overdispersion. Additionally, three bootstrap methodologies designed to assess the reliability of point estimators for area-level means are evaluated. The numerical analysis shows that the MERF, which does not assume a Poisson distribution to model the mean behavior of count data, excels in scenarios of severe overdispersion. Conversely, the GMERF performs best under conditions where Poisson distribution assumptions are moderately met. In a case study using real-world data from the state of Guerrero, Mexico, the proposed methods effectively estimate area-level means while capturing the uncertainty inherent in overdispersed count data. These findings highlight their practical applicability for small area estimation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108218"},"PeriodicalIF":1.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144196139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Penalized maximum likelihood estimation with nonparametric Gaussian scale mixture errors","authors":"Seo-Young Park , Byungtae Seo","doi":"10.1016/j.csda.2025.108206","DOIUrl":"10.1016/j.csda.2025.108206","url":null,"abstract":"<div><div>The penalized least squares and maximum likelihood methods have been successfully employed for simultaneous parameter estimation and variable selection. However, outlying observations can severely affect the quality of the estimator and selection performance. Although some robust methods for variable selection have been proposed in the literature, they often lose substantial efficiency. This is primarily attributed to the excessive dependence on choosing additional tuning parameters or modifying the original objective functions as tools to enhance robustness. In response to these challenges, we use a nonparametric Gaussian scale mixture distribution for the regression error distribution. This approach allows the error distributions in the model to achieve great flexibility and provides data-adaptive robustness. Our proposed estimator exhibits desirable theoretical properties, including sparsity and oracle properties. In the estimation process, we employ a combination of expectation-maximization and gradient-based algorithms for the parametric and nonparametric components, respectively. Through comprehensive numerical studies, encompassing simulation studies and real data analysis, we substantiate the robust performance of the proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108206"},"PeriodicalIF":1.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantile Super Learning for independent and online settings with application to solar power forecasting","authors":"Herbert Susmann , Antoine Chambaz","doi":"10.1016/j.csda.2025.108202","DOIUrl":"10.1016/j.csda.2025.108202","url":null,"abstract":"<div><div>Estimating quantiles of an outcome conditional on covariates is of fundamental interest in statistics with broad application in probabilistic prediction and forecasting. An ensemble method for conditional quantile estimation is proposed, Quantile Super Learning, that combines predictions from multiple candidate algorithms based on their empirical performance measured with respect to a cross-validated empirical risk of the quantile loss function. Theoretical guarantees for both i.i.d. and online data scenarios are presented. The performance of <em>this</em> approach for quantile estimation and in forming prediction intervals is tested in simulation studies. Two case studies related to solar energy are used to illustrate Quantile Super Learning: in an i.i.d. setting, we predict the physical properties of perovskite materials for photovoltaic cells, and in an online setting we forecast ground solar irradiance based on output from dynamic weather ensemble models.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108202"},"PeriodicalIF":1.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changxin Yang , Zhongyi Zhu , Hongmei Lin , Zengyan Fan , Heng Lian
{"title":"Distributed iterative hard thresholding for variable selection in Tobit models","authors":"Changxin Yang , Zhongyi Zhu , Hongmei Lin , Zengyan Fan , Heng Lian","doi":"10.1016/j.csda.2025.108227","DOIUrl":"10.1016/j.csda.2025.108227","url":null,"abstract":"<div><div>While there is a substantial body of research on high-dimensional regression with left-censored responses, few methods address this problem in a distributed manner. Due to data transmission limitations and privacy concerns, centralizing all data is often impractical, necessitating a method for collaborative learning with distributed data. In this paper, we employ the Iterative Hard Thresholding (IHT) method for the Tobit model to address this challenge, allowing one to directly specify the desired sparsity and offering an alternative estimation and variable selection approach. Theoretical analysis shows that our estimator achieves a nearly minimax-optimal convergence rate using only a few rounds of communication. Its practical performance is evaluated under both the pooled and the distributed setting. The former highlights its competitive estimation efficiency and variable selection performance compared to existing approaches, while the latter demonstrates that the decentralized estimator closely matches the performance of its centralized counterpart. When applied to high-dimensional left-censored HIV viral load data, our method also demonstrates comparable performance.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108227"},"PeriodicalIF":1.5,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144203578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}