{"title":"JANE: Just Another latent space NEtwork clustering algorithm","authors":"Alan T. Arakkal, Daniel K. Sewell","doi":"10.1016/j.csda.2025.108228","DOIUrl":"10.1016/j.csda.2025.108228","url":null,"abstract":"<div><div>While latent space network models have been a popular approach for community detection for over 15 years, major computational challenges remain, limiting the ability to scale beyond small networks. The R statistical software package, <span>JANE</span>, introduces a new estimation algorithm with massive speedups derived from: (1) a low dimensional approximation approach to adjust for degree heterogeneity parameters; (2) an approximation of intractable likelihood terms; (3) a fast initialization algorithm; and (4) a novel set of convergence criteria focused on clustering performance. Additionally, the proposed method addresses limitations of current implementations, which rely on a restrictive spherical-shape assumption for the prior distribution on the latent positions; relaxing this constraint allows for greater flexibility across diverse network structures. A simulation study evaluating clustering performance of the proposed approach against state-of-the-art methods shows dramatically improved clustering performance in most scenarios and significant reductions in computational time — up to 45 times faster compared to existing approaches.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108228"},"PeriodicalIF":1.5,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144222027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian forecasting of Italian seismicity using the spatiotemporal RETAS model","authors":"Tom Stindl , Zelong Bi , Clara Grazian","doi":"10.1016/j.csda.2025.108219","DOIUrl":"10.1016/j.csda.2025.108219","url":null,"abstract":"<div><div>Spatiotemporal Renewal Epidemic Type Aftershock Sequence models are self-exciting point processes that model the occurrence time, epicenter, and magnitude of earthquakes in a geographical region. The arrival rate of earthquakes is formulated as the superposition of a main shock renewal process and homogeneous Poisson processes for the aftershocks, motivated by empirical laws in seismology. Existing methods for model fitting rely on maximizing the log-likelihood by either direct numerical optimization or Expectation Maximization algorithms, both of which can suffer from convergence issues and lack adequate quantification of parameter estimation uncertainty. To address these limitations, a Bayesian approach is employed, with posterior inference carried out using a data augmentation strategy within a Markov chain Monte Carlo framework. The branching structure is treated as a latent variable to improve sampling efficiency, and a purpose-built Hamiltonian Monte Carlo sampler is implemented to update the parameters within the Gibbs sampler. This methodology enables parameter uncertainty to be incorporated into forecasts of seismicity. Estimation and forecasting are demonstrated on simulated catalogs and an earthquake catalog from Italy. <span>R</span> code implementing the methods is provided in the Supplementary Materials.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108219"},"PeriodicalIF":1.5,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144261610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Small area prediction of counts under machine learning-type mixed models","authors":"Nicolas Frink, Timo Schmid","doi":"10.1016/j.csda.2025.108218","DOIUrl":"10.1016/j.csda.2025.108218","url":null,"abstract":"<div><div>Small area estimation methods are proposed that use generalized tree-based machine learning techniques to improve the estimation of disaggregated means in small areas using discrete survey data. Specifically, two existing approaches based on random forests - the Generalized Mixed Effects Random Forest (GMERF) and a Mixed Effects Random Forest (MERF) - are extended to accommodate count outcomes, addressing key challenges such as overdispersion. Additionally, three bootstrap methodologies designed to assess the reliability of point estimators for area-level means are evaluated. The numerical analysis shows that the MERF, which does not assume a Poisson distribution to model the mean behavior of count data, excels in scenarios of severe overdispersion. Conversely, the GMERF performs best under conditions where Poisson distribution assumptions are moderately met. In a case study using real-world data from the state of Guerrero, Mexico, the proposed methods effectively estimate area-level means while capturing the uncertainty inherent in overdispersed count data. These findings highlight their practical applicability for small area estimation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108218"},"PeriodicalIF":1.5,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144196139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Frisch-Waugh-Lovell theorem for empirical likelihood","authors":"Yichun Song","doi":"10.1016/j.csda.2025.108208","DOIUrl":"10.1016/j.csda.2025.108208","url":null,"abstract":"<div><div>A Frisch-Waugh-Lovell-type (FWL) theorem for empirical likelihood estimation with instrumental variables is presented, which resembles the standard FWL theorem in ordinary least squares (OLS), but its partitioning procedure employs the empirical likelihood weights at the solution rather than the original sample distribution. This result is leveraged to simplify the computational process through an iterative algorithm, where exogenous variables are partitioned out using weighted least squares, and the weights are updated between iterations. Furthermore, it is demonstrated that iterations converge locally to the original empirical likelihood estimate at a stochastically super-linear rate. A feasible iterative constrained optimization algorithm for calculating empirical-likelihood-based confidence intervals is provided, along with a discussion of its properties. Monte Carlo simulations indicate that the iterative algorithm is robust and produces results within the numerical tolerance of the original empirical likelihood estimator in finite samples, while significantly improves computation in large-scale problems. Additionally, the algorithm performs effectively in an illustrative application using the return to education framework.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108208"},"PeriodicalIF":1.5,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144137907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive-to-sub-null testing for mediation effects in structural equation models","authors":"Jiaqi Huang , Chuyun Ye , Lixing Zhu","doi":"10.1016/j.csda.2025.108205","DOIUrl":"10.1016/j.csda.2025.108205","url":null,"abstract":"<div><div>To effectively implement large-scale hypothesis testing of causal mediation effects and control false discovery rate (FDR) for linear structural equation models, this paper proposes an Adaptive-to-Sub-Null test (AtST) tailored specifically for the assessment of multidimensional mediation effects. The significant distinction of AtST from existing methods is that for every mediator, the weak limits of the test statistic under all mutually exclusive sub-null hypotheses uniformly conform to a chi-square distribution with one degree of freedom. Therefore, in the asymptotic sense, the significance level can be maintained and the <em>p</em>-values can be computed easily without any other prior information on the sub-null hypotheses or resampling technique. In theoretical investigations, we extend existing parameter estimation methods by allowing lower sparsity level in high-dimensional covariate vectors. These results offer a solid base for better FDR control by directly applying the classical Storey's method. We also apply a data-driven approach for selecting the tuning parameter of Storey's estimator. Simulations are conducted to demonstrate the efficacy and validity of the AtST, complemented by an analytical exploration of a genuine dataset for illustration.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108205"},"PeriodicalIF":1.5,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exact statistical analysis for response-adaptive clinical trials: A general and computationally tractable approach","authors":"Stef Baas , Peter Jacko , Sofía S. Villar","doi":"10.1016/j.csda.2025.108207","DOIUrl":"10.1016/j.csda.2025.108207","url":null,"abstract":"<div><div>Response-adaptive clinical trial designs allow targeting a given objective by skewing the allocation of participants to treatments based on observed outcomes. Response-adaptive designs face greater regulatory scrutiny due to potential type I error rate inflation, which limits their uptake in practice. Existing approaches for type I error control either only work for specific designs, have a risk of Monte Carlo/approximation error, are conservative, or computationally intractable. To this end, a general and computationally tractable approach is developed for exact analysis in two-arm response-adaptive designs with binary outcomes. This approach can construct exact tests for designs using either a randomized or deterministic response-adaptive procedure. The constructed conditional and unconditional exact tests generalize Fisher's and Barnard's exact tests, respectively. Furthermore, the approach allows for complexities such as delayed outcomes, early stopping, or allocation of participants in blocks. The efficient implementation of forward recursion allows for testing of two-arm trials with 1,000 participants on a standard computer. Through an illustrative computational study of trials using randomized dynamic programming it is shown that, contrary to what is known for equal allocation, the conditional exact Wald test based on total successes has, almost uniformly, higher power than the unconditional exact Wald test. Two real-world trials with the above-mentioned complexities are re-analyzed to demonstrate the value of the new approach in controlling type I errors and/or improving the statistical power.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108207"},"PeriodicalIF":1.5,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144099882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Dirichlet stochastic block model for composition-weighted networks","authors":"Iuliia Promskaia , Adrian O'Hagan , Michael Fop","doi":"10.1016/j.csda.2025.108204","DOIUrl":"10.1016/j.csda.2025.108204","url":null,"abstract":"<div><div>Network data are prevalent in applications where individual entities interact with each other, and often these interactions have associated weights representing the strength of association. Clustering such weighted network data is a common task, which involves identifying groups of nodes that display similarities in the way they interact. However, traditional clustering methods typically use edge weights in their raw form, overlooking that the observed weights are influenced by the nodes' capacities to distribute weights along the edges. This can lead to clustering results that primarily reflect nodes' total weight capacities rather than the specific interactions between them. One way to address this issue is to analyse the strengths of connections in relative rather than absolute terms, by transforming the relational weights into a compositional format. This approach expresses each edge weight as a proportion of the sending or receiving weight capacity of the respective node. To cluster these data, a Dirichlet stochastic block model tailored for composition-weighted networks is proposed. The model relies on direct modelling of compositional weight vectors using a Dirichlet mixture, where parameters are determined by the cluster labels of sender and receiver nodes. Inference is implemented via an extension of the classification expectation-maximisation algorithm, expressing the complete data likelihood of each node as a function of fixed cluster labels of the remaining nodes. A model selection criterion is derived to determine the optimal number of clusters. The proposed approach is validated through simulation studies, and its practical utility is illustrated on two real-world networks.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108204"},"PeriodicalIF":1.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Penalized maximum likelihood estimation with nonparametric Gaussian scale mixture errors","authors":"Seo-Young Park , Byungtae Seo","doi":"10.1016/j.csda.2025.108206","DOIUrl":"10.1016/j.csda.2025.108206","url":null,"abstract":"<div><div>The penalized least squares and maximum likelihood methods have been successfully employed for simultaneous parameter estimation and variable selection. However, outlying observations can severely affect the quality of the estimator and selection performance. Although some robust methods for variable selection have been proposed in the literature, they often lose substantial efficiency. This is primarily attributed to the excessive dependence on choosing additional tuning parameters or modifying the original objective functions as tools to enhance robustness. In response to these challenges, we use a nonparametric Gaussian scale mixture distribution for the regression error distribution. This approach allows the error distributions in the model to achieve great flexibility and provides data-adaptive robustness. Our proposed estimator exhibits desirable theoretical properties, including sparsity and oracle properties. In the estimation process, we employ a combination of expectation-maximization and gradient-based algorithms for the parametric and nonparametric components, respectively. Through comprehensive numerical studies, encompassing simulation studies and real data analysis, we substantiate the robust performance of the proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108206"},"PeriodicalIF":1.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heavy-tailed matrix-variate hidden Markov models","authors":"Salvatore D. Tomarchio","doi":"10.1016/j.csda.2025.108198","DOIUrl":"10.1016/j.csda.2025.108198","url":null,"abstract":"<div><div>The matrix-variate framework for hidden Markov models (HMMs) is expanded with two families of models using matrix-variate <em>t</em> and contaminated normal distributions. These models improve the handling of tail behavior, clustering, and address challenges in identifying outlying matrices in matrix-variate data. Two Expectation-Conditional Maximization (ECM) algorithms are implemented in the R package <strong>MatrixHMM</strong> for parameter estimation. Simulations assess parameter recovery, robustness, anomaly detection, and show the advantages over alternative approaches. The models are applied to real-world data to analyze labor market dynamics across Italian provinces.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108198"},"PeriodicalIF":1.5,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical inference for partially shape-constrained function-on-scalar linear regression models","authors":"Kyunghee Han , Yeonjoo Park , Soo-Young Kim","doi":"10.1016/j.csda.2025.108200","DOIUrl":"10.1016/j.csda.2025.108200","url":null,"abstract":"<div><div>Functional linear regression models are widely used to link functional/longitudinal outcomes with multiple scalar predictors, identifying time-varying covariate effects through regression coefficient functions. Beyond assessing statistical significance, characterizing the shapes of coefficient functions is crucial for drawing interpretable scientific conclusions. Existing studies on shape-constrained analysis primarily focus on global shapes, which require strict prior knowledge of functional relationships across the entire domain. This often leads to misspecified regression models due to a lack of prior information, making them impractical for real-world applications. To address this, a flexible framework is introduced to identify partial shapes in regression coefficient functions. The proposed partial shape-constrained analysis enables researchers to validate functional shapes within a targeted sub-domain, avoiding the misspecification of shape constraints outside the sub-domain of interest. The method also allows for testing different sub-domains for individual covariates and multiple partial shape constraints across composite sub-domains. Our framework supports both kernel- and spline-based estimation approaches, ensuring robust performance with flexibility in computational preference. Finite-sample experiments across various scenarios demonstrate that the proposed framework significantly outperforms the application of global shape constraints to partial domains in both estimation and inference procedures. The inferential tool particularly maintains the type I error rate at the nominal significance level and exhibits increasing power with larger sample sizes, confirming the consistency of the test procedure. The practicality of partial shape-constrained inference is demonstrated through two applications: a clinical trial on NeuroBloc for type A-resistant cervical dystonia and the National Institute of Mental Health Schizophrenia Study.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108200"},"PeriodicalIF":1.5,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144083910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}