{"title":"An integrated method for clustering and association network inference","authors":"Jeanne Tous, Julien Chiquet","doi":"10.1016/j.csda.2026.108347","DOIUrl":"10.1016/j.csda.2026.108347","url":null,"abstract":"<div><div>High dimensional Gaussian graphical models provide a rigorous framework to describe a network of statistical dependencies between variables, such as genes in genomic regulation studies or species in ecology. Penalized methods, including the standard Graphical-Lasso, are well-known approaches to infer the parameters of these models. As the number of variables in the model grow, the network inference and interpretation become more complex. The Normal-Block model is discussed, a model that clusters variables and considers a network at the cluster level. This both adds structure to the network and reduces the number of parameters at stake, thereby easing the inference and interpretation of the underlying network. The approach builds on Graphical-Lasso to add a penalty on the network’s edges and limit the detection of spurious dependencies. A zero-inflated version of the model is also proposed to account for real-world data properties. For the inference procedure, two approaches are introduced, a two-step method based on existing approaches and an original, more rigorous method that simultaneously infers the clustering of variables and the association network between clusters, using a penalized variational Expectation-Maximization approach. An implementation of the model in R, in a package called <strong>normalblockr</strong>, is available on github<span><span><sup>1</sup></span></span>. The results of the models in terms of clustering and network inference are presented, using both simulated data and various types of real-world data (proteomics and words occurrences on webpages).</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108347"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Renewable penalized linear regression via inverse probability weighting for streaming data with missing covariates","authors":"Kang Meng, Yujie Gai","doi":"10.1016/j.csda.2025.108338","DOIUrl":"10.1016/j.csda.2025.108338","url":null,"abstract":"<div><div>A renewable weighted estimation method for linear regression with non-convex regularization is proposed, tailored for streaming data with missing covariates. The proposed method is implemented via a two-step estimation strategy. In the first step, a renewable formulation of the parameter of interest in the propensity score function is derived. Based on this, a renewable weighted optimization objective for the regression coefficients is constructed in the second step, which is updated using the current data and summary statistics from historical data. The objective is solved via a locally adaptive majorize-minimization algorithm with previous estimates as initialization, while the penalty parameter is determined using the proposed online rolling validation procedure. Theoretical results demonstrate that the renewable estimator is asymptotically normal and maintains estimation efficiency compared to offline methods that process all data at once. Simulation studies and real data analysis further confirm that the proposed estimator achieves competitive statistical performance while significantly improving computational efficiency and reducing memory requirements.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108338"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A smoothed maximum rank correlation estimator for deep ordinal choice models","authors":"Yiwei Fan , Xiaoshi Lu , Xiaoling Lu","doi":"10.1016/j.csda.2026.108345","DOIUrl":"10.1016/j.csda.2026.108345","url":null,"abstract":"<div><div>A smoothed maximum rank correlation (MRC) estimator for ordinal choice models is introduced, combining a linear function with a nonlinear component modeled by deep neural networks to achieve both identifiability and interpretability. A two-step estimation algorithm is designed that maintains the order relations among outputs without relying on the parallelism assumption, making it appealing in practical applicability. The statistical properties of the smoothed MRC estimator are established under regular conditions, including identification, convergence rate, and minimax optimality, while allowing the number of categories to increase with sample size. Our theoretical results extend beyond ordinal choice models and apply to a broad range of generalized regression models. Extensive simulations demonstrate the superiority of the proposed method in classification accuracy and interpretability. Its effectiveness is further validated through applications to twelve benchmark datasets and an online education dataset.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108345"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated specification search for composite-based structural equation modeling: A genetic approach","authors":"Laura Trinchera , Gloria Pietropolli , Mauro Castelli , Florian Schuberth","doi":"10.1016/j.csda.2026.108348","DOIUrl":"10.1016/j.csda.2026.108348","url":null,"abstract":"<div><div>Structural Equation Modeling (SEM) is primarily employed as a confirmatory approach for empirically testing theoretical models by assessing how well they fit collected data. In practice, researchers frequently take a more exploratory approach and manually assess alternative models. Although automated search techniques have been developed for factor-based SEM to identify the best-fitting model, automated specification search remains largely unexplored in composite-based SEM. To address this gap, a new method is introduced: Automated Genetic Algorithm Specification Search for Partial Least Squares Path Modeling (AGAS-PLS). The proposed algorithm combines partial least squares path modeling with a genetic algorithm to identify the “best” structural model. A Monte Carlo simulation was conducted to assess the ability of AGAS-PLS to accurately identify the structural model of the data-generating process under various conditions, including different sample sizes and levels of model complexity. The practical applicability of AGAS-PLS was further illustrated using empirical data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108348"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Certifiably optimal direction estimation in sparse single-index model","authors":"Yangzhou Chen , Lei Yan , Xin Chen , Shuaida He","doi":"10.1016/j.csda.2025.108307","DOIUrl":"10.1016/j.csda.2025.108307","url":null,"abstract":"<div><div>In this paper, we propose a novel method for coefficient estimation in sparse single-index models (SIM). Our approach employs a customized branch-and-bound algorithm to efficiently solve the non-convex problem of sparse direction estimation, which arises from the discrete nature of variable selection. To address this non-convex optimization problem, we derive upper bounds using techniques such as spectral decomposition, matrix inequalities, and the Gershgorin circle theorem, while the lower bounds are obtained through methods like vector truncation and adaptations of the Rifle algorithm. Furthermore, we design customized branching and node selection strategies, with hyperparameters chosen based on AIC, BIC, and HBIC criteria. We prove the convergence of our algorithm, ensuring it reliably reaches optimal solutions. Extensive simulation studies and real data analysis further illustrate the reliable performance and applicability of our proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108307"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient and robust block designs for order-of-addition experiments","authors":"Chang-Yun Lin","doi":"10.1016/j.csda.2026.108346","DOIUrl":"10.1016/j.csda.2026.108346","url":null,"abstract":"<div><div>Designs for Order-of-Addition (OofA) experiments have received growing attention due to their impact on responses based on the sequence of component addition. In certain cases, these experiments involve heterogeneous groups of units, which necessitates the use of blocking to manage variation effects. Despite this, the exploration of block OofA designs remains limited in the literature. As experiments become increasingly complex, addressing this gap is essential to ensure that the designs accurately reflect the effects of the addition sequence and effectively handle the associated variability. Motivated by this, the study seeks to address the gap by expanding the indicator function framework for block OofA designs. The word length pattern is proposed as a criterion for selecting robust block OofA designs. To improve search efficiency and reduce computational demands, an algorithm is developed that employ orthogonal Latin squares for design construction and selection, thereby minimizing the need for exhaustive searches. The analysis, supported by correlation plots, reveals that the algorithms effectively manage confounding and aliasing between effects. Additionally, simulation studies indicate that designs based on the proposed criterion and algorithms achieve power and type I error rates comparable to those of full block OofA designs. This approach offers a practical and efficient method for constructing block OofA designs and may provide valuable insights for future research and applications.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108346"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Copula-based mixtures of regression models for multivariate response data","authors":"Xuetong Cui , Orla A. Murphy , Paul D. McNicholas","doi":"10.1016/j.csda.2026.108340","DOIUrl":"10.1016/j.csda.2026.108340","url":null,"abstract":"<div><div>Clustering is a powerful technique for uncovering hidden patterns or subgroups within complex datasets. Recently, the use of mixtures of multiple linear regression models has gained popularity due to their ability to account for underlying heterogeneity in regression-type data and to provide a comprehensive understanding of covariate impacts across latent subgroups. However, models tailored for a multivariate response are relatively rare, especially when the response variables are dependent. Copula regression addresses this issue by employing copulas to model dependencies between response variables. To address this need, a copula-based finite mixture of regression models is proposed for clustering and interpreting covariate effects in heterogeneous multivariate continuous response data. An expectation-conditional-maximization algorithm is used to estimate the model. Simulation studies and real-data analyses illustrate the improved clustering performance of the proposed models compared to existing methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108340"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Likelihood inference in Gaussian copula models for count time series via minimax exponential tilting","authors":"Quynh Nhu Nguyen, Victor De Oliveira","doi":"10.1016/j.csda.2026.108344","DOIUrl":"10.1016/j.csda.2026.108344","url":null,"abstract":"<div><div>Count time series arise in diverse contexts and may display a diversity of distributional features that may include overdispersion, zero–inflation, covariates’ effects and complex dependence structures. A class of models with the potential to account for this diversity is that of Gaussian copulas, which are computationally challenging to fit. A scalable and accurate likelihood approximation strategy is proposed that employs minimax exponential tilting (MET) to fit Gaussian copula models with arbitrary marginals and ARMA latent processes to count time series. The proposed method, called <em>Time Series Minimax Exponential Tilting</em> (TMET), exploits the exact conditional structure of causal and invertible ARMA processes to construct an optimized importance sampling density. Costly Cholesky decompositions are avoided by using a simplified Innovations algorithm to recursively compute conditional means and variances, and further accelerates computation through a sparse representation of the best linear prediction matrix. These innovations achieve linear computational complexity in the series length, while preserving key theoretical guarantees, including vanishing relative error in rare–event regimes. Simulation studies show that TMET outperforms widely used methods, including the Geweke–Hajivassiliou–Keane (GHK) simulator and the recent Vecchia–based MET (VMET) approach, especially in scenarios with low counts, strong dependence, and moving average latent processes. Beyond estimation, the copula framework is extended to include predictive inference and model diagnostics based on scoring rules and randomized quantile residuals. A real–world application to temperature data from the Kickapoo Downtown Airport in Texas demonstrates TMET’s advantages over the commonly used GHK simulator.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108344"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antoine Godichon-Baggioni , Stéphane Robin , Laure Sansonnet
{"title":"Online and offline robust multivariate linear regression","authors":"Antoine Godichon-Baggioni , Stéphane Robin , Laure Sansonnet","doi":"10.1016/j.csda.2026.108341","DOIUrl":"10.1016/j.csda.2026.108341","url":null,"abstract":"<div><div>The robust estimation of the parameters of multivariate Gaussian linear regression models is considered by using robust versions of the usual (Mahalanobis) least-square criterion, with or without Ridge regularization. Two methods of estimation are introduced: (i) online stochastic gradient descent algorithms and their averaged variants, and (ii) offline fixed-point algorithms. These methods are applied to both the standard and Mahalanobis least-squares criteria, as well as to their regularized counterparts. Under weak assumptions, the resulting estimators are shown to be asymptotically normal. Since the noise covariance matrix is generally unknown, a robust estimate of this matrix is incorporated into the Mahalanobis-based stochastic gradient descent algorithms. Numerical experiments on synthetic data demonstrate a substantial gain in robustness compared with classical least-squares estimators, while also highlighting the computational efficiency of the online procedures. All proposed algorithms are implemented in the <span>R</span> package <span>RobRegression</span>, available on CRAN.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108341"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bao Anh Vu , David Gunawan , Andrew Zammit-Mangion
{"title":"Recursive variational Gaussian approximation with the Whittle likelihood for linear non-Gaussian state space models","authors":"Bao Anh Vu , David Gunawan , Andrew Zammit-Mangion","doi":"10.1016/j.csda.2025.108324","DOIUrl":"10.1016/j.csda.2025.108324","url":null,"abstract":"<div><div>Parameter inference for linear and non-Gaussian state space models is challenging because the likelihood function contains an intractable integral over the latent state variables. While Markov chain Monte Carlo (MCMC) methods provide exact samples from the posterior distribution as the number of samples goes to infinity, they tend to have high computational cost, particularly for observations of a long time series. When inference with MCMC methods is computationally expensive, variational Bayes (VB) methods are a useful alternative. VB methods approximate the posterior density of the parameters with a simple and tractable distribution found through optimisation. A novel sequential VB algorithm that makes use of the Whittle likelihood is proposed for computationally efficient parameter inference in linear, non-Gaussian state space models. The algorithm, called Recursive Variational Gaussian Approximation with the Whittle Likelihood (R-VGA-Whittle), updates the variational parameters by processing data in the frequency domain. At each iteration, R-VGA-Whittle requires the gradient and Hessian of the Whittle log-likelihood, which are available in closed form. Through several examples involving a linear Gaussian state space model; a univariate/bivariate stochastic volatility model; and a state space model with Student’s t measurement error, where the latent states follow an autoregressive fractionally integrated moving average (ARFIMA) model, R-VGA-Whittle is shown to provide good approximations to posterior distributions of the parameters, and it is very computationally efficient when compared to asymptotically exact methods such as Hamiltonian Monte Carlo.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108324"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}