{"title":"A Frisch-Waugh-Lovell theorem for empirical likelihood","authors":"Yichun Song","doi":"10.1016/j.csda.2025.108208","DOIUrl":"10.1016/j.csda.2025.108208","url":null,"abstract":"<div><div>A Frisch-Waugh-Lovell-type (FWL) theorem for empirical likelihood estimation with instrumental variables is presented, which resembles the standard FWL theorem in ordinary least squares (OLS), but its partitioning procedure employs the empirical likelihood weights at the solution rather than the original sample distribution. This result is leveraged to simplify the computational process through an iterative algorithm, where exogenous variables are partitioned out using weighted least squares, and the weights are updated between iterations. Furthermore, it is demonstrated that iterations converge locally to the original empirical likelihood estimate at a stochastically super-linear rate. A feasible iterative constrained optimization algorithm for calculating empirical-likelihood-based confidence intervals is provided, along with a discussion of its properties. Monte Carlo simulations indicate that the iterative algorithm is robust and produces results within the numerical tolerance of the original empirical likelihood estimator in finite samples, while significantly improves computation in large-scale problems. Additionally, the algorithm performs effectively in an illustrative application using the return to education framework.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108208"},"PeriodicalIF":1.5,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144137907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive-to-sub-null testing for mediation effects in structural equation models","authors":"Jiaqi Huang , Chuyun Ye , Lixing Zhu","doi":"10.1016/j.csda.2025.108205","DOIUrl":"10.1016/j.csda.2025.108205","url":null,"abstract":"<div><div>To effectively implement large-scale hypothesis testing of causal mediation effects and control false discovery rate (FDR) for linear structural equation models, this paper proposes an Adaptive-to-Sub-Null test (AtST) tailored specifically for the assessment of multidimensional mediation effects. The significant distinction of AtST from existing methods is that for every mediator, the weak limits of the test statistic under all mutually exclusive sub-null hypotheses uniformly conform to a chi-square distribution with one degree of freedom. Therefore, in the asymptotic sense, the significance level can be maintained and the <em>p</em>-values can be computed easily without any other prior information on the sub-null hypotheses or resampling technique. In theoretical investigations, we extend existing parameter estimation methods by allowing lower sparsity level in high-dimensional covariate vectors. These results offer a solid base for better FDR control by directly applying the classical Storey's method. We also apply a data-driven approach for selecting the tuning parameter of Storey's estimator. Simulations are conducted to demonstrate the efficacy and validity of the AtST, complemented by an analytical exploration of a genuine dataset for illustration.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108205"},"PeriodicalIF":1.5,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exact statistical analysis for response-adaptive clinical trials: A general and computationally tractable approach","authors":"Stef Baas , Peter Jacko , Sofía S. Villar","doi":"10.1016/j.csda.2025.108207","DOIUrl":"10.1016/j.csda.2025.108207","url":null,"abstract":"<div><div>Response-adaptive clinical trial designs allow targeting a given objective by skewing the allocation of participants to treatments based on observed outcomes. Response-adaptive designs face greater regulatory scrutiny due to potential type I error rate inflation, which limits their uptake in practice. Existing approaches for type I error control either only work for specific designs, have a risk of Monte Carlo/approximation error, are conservative, or computationally intractable. To this end, a general and computationally tractable approach is developed for exact analysis in two-arm response-adaptive designs with binary outcomes. This approach can construct exact tests for designs using either a randomized or deterministic response-adaptive procedure. The constructed conditional and unconditional exact tests generalize Fisher's and Barnard's exact tests, respectively. Furthermore, the approach allows for complexities such as delayed outcomes, early stopping, or allocation of participants in blocks. The efficient implementation of forward recursion allows for testing of two-arm trials with 1,000 participants on a standard computer. Through an illustrative computational study of trials using randomized dynamic programming it is shown that, contrary to what is known for equal allocation, the conditional exact Wald test based on total successes has, almost uniformly, higher power than the unconditional exact Wald test. Two real-world trials with the above-mentioned complexities are re-analyzed to demonstrate the value of the new approach in controlling type I errors and/or improving the statistical power.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108207"},"PeriodicalIF":1.5,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144099882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Dirichlet stochastic block model for composition-weighted networks","authors":"Iuliia Promskaia , Adrian O'Hagan , Michael Fop","doi":"10.1016/j.csda.2025.108204","DOIUrl":"10.1016/j.csda.2025.108204","url":null,"abstract":"<div><div>Network data are prevalent in applications where individual entities interact with each other, and often these interactions have associated weights representing the strength of association. Clustering such weighted network data is a common task, which involves identifying groups of nodes that display similarities in the way they interact. However, traditional clustering methods typically use edge weights in their raw form, overlooking that the observed weights are influenced by the nodes' capacities to distribute weights along the edges. This can lead to clustering results that primarily reflect nodes' total weight capacities rather than the specific interactions between them. One way to address this issue is to analyse the strengths of connections in relative rather than absolute terms, by transforming the relational weights into a compositional format. This approach expresses each edge weight as a proportion of the sending or receiving weight capacity of the respective node. To cluster these data, a Dirichlet stochastic block model tailored for composition-weighted networks is proposed. The model relies on direct modelling of compositional weight vectors using a Dirichlet mixture, where parameters are determined by the cluster labels of sender and receiver nodes. Inference is implemented via an extension of the classification expectation-maximisation algorithm, expressing the complete data likelihood of each node as a function of fixed cluster labels of the remaining nodes. A model selection criterion is derived to determine the optimal number of clusters. The proposed approach is validated through simulation studies, and its practical utility is illustrated on two real-world networks.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108204"},"PeriodicalIF":1.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Penalized maximum likelihood estimation with nonparametric Gaussian scale mixture errors","authors":"Seo-Young Park , Byungtae Seo","doi":"10.1016/j.csda.2025.108206","DOIUrl":"10.1016/j.csda.2025.108206","url":null,"abstract":"<div><div>The penalized least squares and maximum likelihood methods have been successfully employed for simultaneous parameter estimation and variable selection. However, outlying observations can severely affect the quality of the estimator and selection performance. Although some robust methods for variable selection have been proposed in the literature, they often lose substantial efficiency. This is primarily attributed to the excessive dependence on choosing additional tuning parameters or modifying the original objective functions as tools to enhance robustness. In response to these challenges, we use a nonparametric Gaussian scale mixture distribution for the regression error distribution. This approach allows the error distributions in the model to achieve great flexibility and provides data-adaptive robustness. Our proposed estimator exhibits desirable theoretical properties, including sparsity and oracle properties. In the estimation process, we employ a combination of expectation-maximization and gradient-based algorithms for the parametric and nonparametric components, respectively. Through comprehensive numerical studies, encompassing simulation studies and real data analysis, we substantiate the robust performance of the proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108206"},"PeriodicalIF":1.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heavy-tailed matrix-variate hidden Markov models","authors":"Salvatore D. Tomarchio","doi":"10.1016/j.csda.2025.108198","DOIUrl":"10.1016/j.csda.2025.108198","url":null,"abstract":"<div><div>The matrix-variate framework for hidden Markov models (HMMs) is expanded with two families of models using matrix-variate <em>t</em> and contaminated normal distributions. These models improve the handling of tail behavior, clustering, and address challenges in identifying outlying matrices in matrix-variate data. Two Expectation-Conditional Maximization (ECM) algorithms are implemented in the R package <strong>MatrixHMM</strong> for parameter estimation. Simulations assess parameter recovery, robustness, anomaly detection, and show the advantages over alternative approaches. The models are applied to real-world data to analyze labor market dynamics across Italian provinces.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108198"},"PeriodicalIF":1.5,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical inference for partially shape-constrained function-on-scalar linear regression models","authors":"Kyunghee Han , Yeonjoo Park , Soo-Young Kim","doi":"10.1016/j.csda.2025.108200","DOIUrl":"10.1016/j.csda.2025.108200","url":null,"abstract":"<div><div>Functional linear regression models are widely used to link functional/longitudinal outcomes with multiple scalar predictors, identifying time-varying covariate effects through regression coefficient functions. Beyond assessing statistical significance, characterizing the shapes of coefficient functions is crucial for drawing interpretable scientific conclusions. Existing studies on shape-constrained analysis primarily focus on global shapes, which require strict prior knowledge of functional relationships across the entire domain. This often leads to misspecified regression models due to a lack of prior information, making them impractical for real-world applications. To address this, a flexible framework is introduced to identify partial shapes in regression coefficient functions. The proposed partial shape-constrained analysis enables researchers to validate functional shapes within a targeted sub-domain, avoiding the misspecification of shape constraints outside the sub-domain of interest. The method also allows for testing different sub-domains for individual covariates and multiple partial shape constraints across composite sub-domains. Our framework supports both kernel- and spline-based estimation approaches, ensuring robust performance with flexibility in computational preference. Finite-sample experiments across various scenarios demonstrate that the proposed framework significantly outperforms the application of global shape constraints to partial domains in both estimation and inference procedures. The inferential tool particularly maintains the type I error rate at the nominal significance level and exhibits increasing power with larger sample sizes, confirming the consistency of the test procedure. The practicality of partial shape-constrained inference is demonstrated through two applications: a clinical trial on NeuroBloc for type A-resistant cervical dystonia and the National Institute of Mental Health Schizophrenia Study.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108200"},"PeriodicalIF":1.5,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144083910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed variable screening for generalized linear models","authors":"Tianbo Diao , Bo Li , Lianqiang Qu , Liuquan Sun","doi":"10.1016/j.csda.2025.108203","DOIUrl":"10.1016/j.csda.2025.108203","url":null,"abstract":"<div><div>In this article, we develop a distributed variable screening method for generalized linear models. This method is designed to handle situations where both the sample size and the number of covariates are large. Specifically, the proposed method selects relevant covariates by using a sparsity-restricted surrogate likelihood estimator. It takes into account the joint effects of the covariates rather than just the marginal effect, and this characteristic enhances the reliability of the screening results. We establish the sure screening property of the proposed method, which ensures that with a high probability, the true model is included in the selected model. Simulation studies are conducted to evaluate the finite sample performance of the proposed method, and an application to a real dataset showcases its practical utility.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108203"},"PeriodicalIF":1.5,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantile Super Learning for independent and online settings with application to solar power forecasting","authors":"Herbert Susmann , Antoine Chambaz","doi":"10.1016/j.csda.2025.108202","DOIUrl":"10.1016/j.csda.2025.108202","url":null,"abstract":"<div><div>Estimating quantiles of an outcome conditional on covariates is of fundamental interest in statistics with broad application in probabilistic prediction and forecasting. An ensemble method for conditional quantile estimation is proposed, Quantile Super Learning, that combines predictions from multiple candidate algorithms based on their empirical performance measured with respect to a cross-validated empirical risk of the quantile loss function. Theoretical guarantees for both i.i.d. and online data scenarios are presented. The performance of <em>this</em> approach for quantile estimation and in forming prediction intervals is tested in simulation studies. Two case studies related to solar energy are used to illustrate Quantile Super Learning: in an i.i.d. setting, we predict the physical properties of perovskite materials for photovoltaic cells, and in an online setting we forecast ground solar irradiance based on output from dynamic weather ensemble models.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108202"},"PeriodicalIF":1.5,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinran Zhang , Xiaohui Yuan , Chunjie Wang , Xinyuan Song
{"title":"Monotone composite quantile regression neural network for censored data with a cure fraction","authors":"Xinran Zhang , Xiaohui Yuan , Chunjie Wang , Xinyuan Song","doi":"10.1016/j.csda.2025.108201","DOIUrl":"10.1016/j.csda.2025.108201","url":null,"abstract":"<div><div>The cure rate monotone composite quantile regression neural network model is investigated as an extension of the cure rate quantile model. It can uncover complex nonlinear relationships and effectively ensure the non-crossing of quantile predictions. An iterative algorithm coupled with data augmentation is developed to predict the survival time of susceptible subjects and the cure rate among all subjects. Simulation studies indicate that the proposed approach exhibits advantages in prediction over traditional statistical methods in finite samples when nonlinearity exists between response and predictors. The analysis of two real datasets further validates the utility of the proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108201"},"PeriodicalIF":1.5,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143935576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}