{"title":"High-dimensional response growth curve modeling for longitudinal neuroimaging analysis","authors":"Lu Wang , Xiang Lyu , Lexin Li","doi":"10.1016/j.csda.2025.108239","DOIUrl":"10.1016/j.csda.2025.108239","url":null,"abstract":"<div><div>There is increasing interest in modeling high-dimensional longitudinal outcomes in applications such as developmental neuroimaging research. Growth curve model offers a useful tool to capture both the mean growth pattern across individuals, as well as the dynamic changes of outcomes over time within each individual. However, when the number of outcomes is large, it becomes challenging and often infeasible to tackle the large covariance matrix of the random effects involved in the model. A high-dimensional response growth curve model, with three novel components, is proposed: a low-rank factor model structure that substantially reduces the number of parameters in the large covariance matrix, a re-parameterization formulation coupled with a sparsity penalty that selects important fixed and random effect terms, and a computational trick that turns the inversion of a large matrix into the inversion of a stack of small matrices and thus considerably speeds up the computation. An efficient expectation-maximization-type estimation algorithm is developed, and the competitive performance of the proposed method is demonstrated through both simulations and a longitudinal study of brain structural connectivity in association with human immunodeficiency virus.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108239"},"PeriodicalIF":1.5,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144580699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneously detecting spatiotemporal changes with penalized Poisson regression models","authors":"Zerui Zhang , Xin Wang , Xin Zhang , Jing Zhang","doi":"10.1016/j.csda.2025.108240","DOIUrl":"10.1016/j.csda.2025.108240","url":null,"abstract":"<div><div>In the realm of large-scale spatiotemporal data, abrupt changes are commonly occurring across both spatial and temporal domains. To address the concurrent challenges of detecting change points and identifying spatial clusters within spatiotemporal count data, an innovative method is introduced based on the Poisson regression model. The proposed method employs doubly fused penalization to unveil the underlying spatiotemporal change patterns. To efficiently estimate the model, an iterative shrinkage and threshold based algorithm is developed to minimize the doubly penalized likelihood function. The reliability and accuracy is confirmed by the statistical consistency properties. Furthermore, extensive numerical experiments are conducted to validate the theoretical findings, thereby highlighting the superior performance of the proposed method when compared to existing competitive approaches.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108240"},"PeriodicalIF":1.5,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New tests for the identity and sphericity of high-dimensional covariance matrices via U-statistics","authors":"Xiaoge Xiong","doi":"10.1016/j.csda.2025.108242","DOIUrl":"10.1016/j.csda.2025.108242","url":null,"abstract":"<div><div>Two novel test procedures are proposed for the identity and sphericity of covariance matrices in high-dimensional asymptotic frameworks, both constructed via U-statistics. The limiting distributions of these tests are established under null and local alternative hypotheses. Monte Carlo simulation results demonstrate their superiority over several competing methods across various scenarios, with the proposed tests achieving full power against both dense and sparse alternatives. The effectiveness of the proposed tests is further validated through an application to a colon dataset.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108242"},"PeriodicalIF":1.5,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144570845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-dimensional and banded integer-valued autoregressive processes","authors":"Nuo Xu, Kai Yang","doi":"10.1016/j.csda.2025.108243","DOIUrl":"10.1016/j.csda.2025.108243","url":null,"abstract":"<div><div>The modeling of high-dimensional time series has always been an appealing and challenging problem. The main difficulties of modeling high-dimensional time series lie in the curse of dimensionality and complex cross dependence between adjacent components. To solve these problems for high-dimensional time series of counts, a class of high-dimensional and banded integer-valued autoregressive processes without assuming the innovation's distribution is proposed. A banded thinning structure is constructed to diminish the parameters' dimension. The componentwise conditional least squares and weighted conditional least squares methods are developed to estimate the banded autoregressive coefficient matrices. The bandwidth parameter is identified via a marginal Bayesian information criterion method. Some numerical results are provided to show the good performance of the estimators. Finally, the superiority of the proposed model is shown by an application to an air quality data set of different cities.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108243"},"PeriodicalIF":1.5,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meiling Hao , Ruiyu Yang , Fangfang Bai , Liuquan Sun
{"title":"Conditional inference for ultrahigh-dimensional additive hazards model","authors":"Meiling Hao , Ruiyu Yang , Fangfang Bai , Liuquan Sun","doi":"10.1016/j.csda.2025.108244","DOIUrl":"10.1016/j.csda.2025.108244","url":null,"abstract":"<div><div>In the realm of high-throughput genomic data, modeling with ultrahigh-dimensional covariates and censored survival outcomes is of great importance. We conduct conditional inference for the ultrahigh-dimensional additive hazards model, allowing both the covariates of interest and nuisance covariates to be ultrahigh-dimensional. The presence of right censorship with survival outcomes adds an extra layer of complexity to the original data structure, posing significant challenges for the ultrahigh-dimensional additive hazards model. To address this, we introduce an innovative test statistic based on the quadratic norm of the score function. Moreover, when there is a high correlation between the covariates of interest and nuisance covariates, we propose a decorrelated score function-based test statistic to enhance statistical power. Additionally, we establish the limiting distributions of the test statistics under both the null and local alternative hypotheses, further enhancing the computational appeal of our approach. The proposed statistics are thoroughly evaluated through extensive simulation studies and applied to two real data examples.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108244"},"PeriodicalIF":1.5,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ricardo Blum , Munir Hiabu , Enno Mammen , Joseph T. Meyer
{"title":"Pure interaction effects unseen by Random Forests","authors":"Ricardo Blum , Munir Hiabu , Enno Mammen , Joseph T. Meyer","doi":"10.1016/j.csda.2025.108237","DOIUrl":"10.1016/j.csda.2025.108237","url":null,"abstract":"<div><div>Random Forests are widely claimed to capture interactions well. However, some simple examples suggest that they perform poorly in the presence of certain pure interactions that the conventional CART criterion struggles to capture during tree construction. Motivated from this, it is argued that simple alternative partitioning schemes used in the tree growing procedure can enhance identification of these interactions. In a simulation study these variants are compared to conventional Random Forests and Extremely Randomized Trees. The results validate that the modifications considered enhance the model's fitting ability in scenarios where pure interactions play a crucial role. Finally, the methods are applied to real datasets.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108237"},"PeriodicalIF":1.5,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable selection for spatio-temporal conditionally Poisson point processes","authors":"Achmad Choiruddin , Jonatan A. González , Jorge Mateu , Alwan Fadlurohman , Rasmus Waagepetersen","doi":"10.1016/j.csda.2025.108238","DOIUrl":"10.1016/j.csda.2025.108238","url":null,"abstract":"<div><div>Spatio-temporal point pattern data are becoming prevalent in many scientific disciplines. We consider a sequence of spatial point processes where each point process is Poisson given the past. We model the conditional first-order intensity function of each point process as a parametric log-linear function of spatial, temporal, and spatio-temporal covariates that may depend on previous point patterns. Dealing with spatio-temporal covariates brings computational and methodological challenges compared to the purely spatial case. We extend regularisation methods for spatial point process variable selection to obtain parsimonious and interpretable models in the considered spatio-temporal case. Using our proposed methodology, we conduct two simulation studies and examine an application to criminal activity in the Kennedy district of Bogota. In the application, we consider a spatio-temporal point pattern data of crime locations and a number of spatial, temporal, and spatio-temporal covariates related to urban places, environmental factors, and further space-time factors. The intensity function of vehicle thefts is estimated, considering other crimes as covariate information. The proposed methodology offers a comprehensive approach for analysing spatio-temporal point pattern crime data, capturing complex relationships between covariates and crime occurrences over space and time.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108238"},"PeriodicalIF":1.5,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144535764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A score-based threshold effect test in time series models","authors":"Shufang Wei , Yaping Deng , Yaxing Yang","doi":"10.1016/j.csda.2025.108236","DOIUrl":"10.1016/j.csda.2025.108236","url":null,"abstract":"<div><div>A score-based test statistic is developed to compare a linear ARMA model with its threshold extension. In particular, the focus is on testing the threshold effect in continuous threshold models with no jump at the threshold. Notably, while developed for continuous threshold models, the proposed test remains effective for discontinuous cases. The proposed test does not require fitting the model under the alternative hypothesis, making it computationally more efficient than the quasi-likelihood ratio test. The asymptotic distributions of the score-based test statistic are derived under both the null hypothesis and local alternatives. Simulations indicate that the proposed test has better size than the quasi-likelihood ratio test and demonstrates stronger power compared to the Lagrange Multiplier test. The asymptotic theory of the least square estimation for the continuous threshold ARMA model is further established. An application to the quarterly U.S. civilian unemployment rates data is given.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108236"},"PeriodicalIF":1.5,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144491125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian selection approach for categorical responses via multinomial probit models","authors":"Chi-Hsiang Chu , Kuo-Jung Lee , Chien-Chin Hsu , Ray-Bing Chen","doi":"10.1016/j.csda.2025.108233","DOIUrl":"10.1016/j.csda.2025.108233","url":null,"abstract":"<div><div>A multinomial probit model is proposed to examine a categorical response variable, with the main objective being the identification of the influential variables in the model. To this end, a Bayesian selection technique using two hierarchical indicators is employed. The first indicator denotes a variable's relevance to the categorical response, and the subsequent indicator relates to the variable's importance at a specific categorical level, which aids in assessing its impact at that level. The selection process relies on the posterior indicator samples generated through an MCMC algorithm. The efficacy of our Bayesian selection strategy is demonstrated through both simulation and an application to a real-world example.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108233"},"PeriodicalIF":1.5,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model-based clustering for covariance matrices via penalized Wishart mixture models","authors":"Andrea Cappozzo , Alessandro Casa","doi":"10.1016/j.csda.2025.108232","DOIUrl":"10.1016/j.csda.2025.108232","url":null,"abstract":"<div><div>Covariance matrices provide a valuable source of information about complex interactions and dependencies within the data. However, from a clustering perspective, this information has often been underutilized and overlooked. Indeed, commonly adopted distance-based approaches tend to rely primarily on mean levels to characterize and differentiate between groups. Recently, there have been promising efforts to cluster covariance matrices directly, thereby distinguishing groups solely based on the relationships between variables. From a model-based perspective, a probabilistic formalization has been provided by considering a mixture model with component densities following a Wishart distribution. Notwithstanding, this approach faces challenges when dealing with a large number of variables, as the number of parameters to be estimated increases quadratically. To address this issue, a sparse Wishart mixture model is proposed, which assumes that the component scale matrices possess a cluster-dependent degree of sparsity. Model estimation is performed by maximizing a penalized log-likelihood, enforcing a covariance graphical lasso penalty on the component scale matrices. This penalty not only reduces the number of non-zero parameters, mitigating the challenges of high-dimensional settings, but also enhances the interpretability of results by emphasizing the most relevant relationships among variables. The proposed methodology is tested on both simulated and real data, demonstrating its ability to unravel the complexities of neuroimaging data and effectively cluster subjects based on the relational patterns among distinct brain regions.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108232"},"PeriodicalIF":1.5,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}