Computational Statistics & Data Analysis最新文献_第5页

Model-based clustering for covariance matrices via penalized Wishart mixture models 基于模型的基于惩罚Wishart混合模型的协方差矩阵聚类

IF 1.5 3区数学

Computational Statistics & Data Analysis Pub Date : 2025-06-20 DOI: 10.1016/j.csda.2025.108232

Andrea Cappozzo , Alessandro Casa

{"title":"Model-based clustering for covariance matrices via penalized Wishart mixture models","authors":"Andrea Cappozzo , Alessandro Casa","doi":"10.1016/j.csda.2025.108232","DOIUrl":"10.1016/j.csda.2025.108232","url":null,"abstract":"<div><div>Covariance matrices provide a valuable source of information about complex interactions and dependencies within the data. However, from a clustering perspective, this information has often been underutilized and overlooked. Indeed, commonly adopted distance-based approaches tend to rely primarily on mean levels to characterize and differentiate between groups. Recently, there have been promising efforts to cluster covariance matrices directly, thereby distinguishing groups solely based on the relationships between variables. From a model-based perspective, a probabilistic formalization has been provided by considering a mixture model with component densities following a Wishart distribution. Notwithstanding, this approach faces challenges when dealing with a large number of variables, as the number of parameters to be estimated increases quadratically. To address this issue, a sparse Wishart mixture model is proposed, which assumes that the component scale matrices possess a cluster-dependent degree of sparsity. Model estimation is performed by maximizing a penalized log-likelihood, enforcing a covariance graphical lasso penalty on the component scale matrices. This penalty not only reduces the number of non-zero parameters, mitigating the challenges of high-dimensional settings, but also enhances the interpretability of results by emphasizing the most relevant relationships among variables. The proposed methodology is tested on both simulated and real data, demonstrating its ability to unravel the complexities of neuroimaging data and effectively cluster subjects based on the relational patterns among distinct brain regions.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108232"},"PeriodicalIF":1.5,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint estimation of precision matrices for long-memory time series 长记忆时间序列精度矩阵的联合估计

IF 1.5 3区数学

Computational Statistics & Data Analysis Pub Date : 2025-06-19 DOI: 10.1016/j.csda.2025.108234

Qihu Zhang , Jongik Chung , Cheolwoo Park

引用次数: 0

Inference on a stochastic SIR model including growth curves 包含生长曲线的随机SIR模型的推论

IF 1.5 3区数学

Computational Statistics & Data Analysis Pub Date : 2025-06-16 DOI: 10.1016/j.csda.2025.108231

Giuseppina Albano , Virginia Giorno , Gema Pérez-Romero , Francisco de Asis Torres-Ruiz

引用次数: 0

Privacy-preserving communication-efficient spectral clustering for distributed multiple networks 分布式多网络的保密性通信高效频谱聚类

IF 1.5 3区数学

Computational Statistics & Data Analysis Pub Date : 2025-06-09 DOI: 10.1016/j.csda.2025.108230

Shanghao Wu , Xiao Guo , Hai Zhang

引用次数: 0

Flexible modeling of left-truncated and interval-censored competing risks data with missing event types 具有缺失事件类型的左截尾和区间截尾竞争风险数据的灵活建模

IF 1.5 3区数学

Computational Statistics & Data Analysis Pub Date : 2025-06-05 DOI: 10.1016/j.csda.2025.108229

Yichen Lou , Yuqing Ma , Liming Xiang , Jianguo Sun

{"title":"Flexible modeling of left-truncated and interval-censored competing risks data with missing event types","authors":"Yichen Lou , Yuqing Ma , Liming Xiang , Jianguo Sun","doi":"10.1016/j.csda.2025.108229","DOIUrl":"10.1016/j.csda.2025.108229","url":null,"abstract":"<div><div>Interval-censored competing risks data arise in many cohort studies in clinical research, where multiple types of events subject to interval censoring are included and the occurrence of the primary event of interest may be censored by the occurrence of other events. The presence of missing event types and left truncation poses challenges to the regression analysis of such data. We propose a new two-stage estimation procedure under a class of semiparametric generalized odds rate transformation models to overcome these challenges. Our method first facilitates the estimation of both the probability of response and the probability of occurrence of each type of event under the missing at random assumption, using either parametric or non-parametric methods. An augmented inverse probability weighting likelihood based on the complete-case likelihood and data from subjects with missing type of event is then maximized for estimating regression parameters. We provide desirable asymptotic properties and construct a concordance index to evaluate the model's discriminative ability. The proposed method is demonstrated through extensive simulations and the analysis of data from the Amsterdam cohort study on HIV infection and AIDS.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108229"},"PeriodicalIF":1.5,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144242893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Region detection and image clustering via sparse Kronecker product decomposition 基于稀疏Kronecker积分解的区域检测与图像聚类

IF 1.5 3区数学

Computational Statistics & Data Analysis Pub Date : 2025-06-03 DOI: 10.1016/j.csda.2025.108226

Guang Yang , Long Feng

{"title":"Region detection and image clustering via sparse Kronecker product decomposition","authors":"Guang Yang , Long Feng","doi":"10.1016/j.csda.2025.108226","DOIUrl":"10.1016/j.csda.2025.108226","url":null,"abstract":"<div><div>Image clustering is usually conducted by vectorizing image pixels, treating them as independent, and applying classical clustering approaches to the obtained features. However, as image data is often of high-dimensional and contains rich spatial information, such treatment is far from satisfactory. For medical image data, another important characteristic is the region-wise sparseness in signals. That is to say, there are only a few unknown regions in the medical image that differentiate the images associated with different groups of patients, while other regions are uninformative. Accurately detecting these informative regions would not only improve clustering accuracy, more importantly, it would also provide interpretations for the rationale behind them. Motivated by the need to identify significant regions of interest, we propose a general framework named Image Clustering via Sparse Kronecker Product Decomposition (IC-SKPD). This framework aims to simultaneously divide samples into clusters and detect regions that are informative for clustering. Our framework is general in the sense that it provides a unified treatment for matrix and tensor-valued samples. An iterative hard-thresholded singular value decomposition approach is developed to solve this model. Theoretically, the IC-SKPD enjoys guarantees for clustering accuracy and region detection consistency under mild conditions on the minimum signals. Comprehensive simulations along with real data analysis further validate the superior performance of IC-SKPD on clustering and region detection.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108226"},"PeriodicalIF":1.5,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144242892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distributed iterative hard thresholding for variable selection in Tobit models Tobit模型中变量选择的分布式迭代硬阈值

IF 1.5 3区数学

Computational Statistics & Data Analysis Pub Date : 2025-06-03 DOI: 10.1016/j.csda.2025.108227

Changxin Yang , Zhongyi Zhu , Hongmei Lin , Zengyan Fan , Heng Lian

{"title":"Distributed iterative hard thresholding for variable selection in Tobit models","authors":"Changxin Yang , Zhongyi Zhu , Hongmei Lin , Zengyan Fan , Heng Lian","doi":"10.1016/j.csda.2025.108227","DOIUrl":"10.1016/j.csda.2025.108227","url":null,"abstract":"<div><div>While there is a substantial body of research on high-dimensional regression with left-censored responses, few methods address this problem in a distributed manner. Due to data transmission limitations and privacy concerns, centralizing all data is often impractical, necessitating a method for collaborative learning with distributed data. In this paper, we employ the Iterative Hard Thresholding (IHT) method for the Tobit model to address this challenge, allowing one to directly specify the desired sparsity and offering an alternative estimation and variable selection approach. Theoretical analysis shows that our estimator achieves a nearly minimax-optimal convergence rate using only a few rounds of communication. Its practical performance is evaluated under both the pooled and the distributed setting. The former highlights its competitive estimation efficiency and variable selection performance compared to existing approaches, while the latter demonstrates that the decentralized estimator closely matches the performance of its centralized counterpart. When applied to high-dimensional left-censored HIV viral load data, our method also demonstrates comparable performance.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108227"},"PeriodicalIF":1.5,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144203578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

JANE: Just Another latent space NEtwork clustering algorithm 简：只是另一个潜在空间网络聚类算法

IF 1.5 3区数学

Computational Statistics & Data Analysis Pub Date : 2025-06-02 DOI: 10.1016/j.csda.2025.108228

Alan T. Arakkal, Daniel K. Sewell

{"title":"JANE: Just Another latent space NEtwork clustering algorithm","authors":"Alan T. Arakkal, Daniel K. Sewell","doi":"10.1016/j.csda.2025.108228","DOIUrl":"10.1016/j.csda.2025.108228","url":null,"abstract":"<div><div>While latent space network models have been a popular approach for community detection for over 15 years, major computational challenges remain, limiting the ability to scale beyond small networks. The R statistical software package, <span>JANE</span>, introduces a new estimation algorithm with massive speedups derived from: (1) a low dimensional approximation approach to adjust for degree heterogeneity parameters; (2) an approximation of intractable likelihood terms; (3) a fast initialization algorithm; and (4) a novel set of convergence criteria focused on clustering performance. Additionally, the proposed method addresses limitations of current implementations, which rely on a restrictive spherical-shape assumption for the prior distribution on the latent positions; relaxing this constraint allows for greater flexibility across diverse network structures. A simulation study evaluating clustering performance of the proposed approach against state-of-the-art methods shows dramatically improved clustering performance in most scenarios and significant reductions in computational time — up to 45 times faster compared to existing approaches.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108228"},"PeriodicalIF":1.5,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144222027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian forecasting of Italian seismicity using the spatiotemporal RETAS model 利用时空RETAS模型对意大利地震活动性的贝叶斯预测

IF 1.5 3区数学

Computational Statistics & Data Analysis Pub Date : 2025-06-02 DOI: 10.1016/j.csda.2025.108219

Tom Stindl , Zelong Bi , Clara Grazian

{"title":"Bayesian forecasting of Italian seismicity using the spatiotemporal RETAS model","authors":"Tom Stindl , Zelong Bi , Clara Grazian","doi":"10.1016/j.csda.2025.108219","DOIUrl":"10.1016/j.csda.2025.108219","url":null,"abstract":"<div><div>Spatiotemporal Renewal Epidemic Type Aftershock Sequence models are self-exciting point processes that model the occurrence time, epicenter, and magnitude of earthquakes in a geographical region. The arrival rate of earthquakes is formulated as the superposition of a main shock renewal process and homogeneous Poisson processes for the aftershocks, motivated by empirical laws in seismology. Existing methods for model fitting rely on maximizing the log-likelihood by either direct numerical optimization or Expectation Maximization algorithms, both of which can suffer from convergence issues and lack adequate quantification of parameter estimation uncertainty. To address these limitations, a Bayesian approach is employed, with posterior inference carried out using a data augmentation strategy within a Markov chain Monte Carlo framework. The branching structure is treated as a latent variable to improve sampling efficiency, and a purpose-built Hamiltonian Monte Carlo sampler is implemented to update the parameters within the Gibbs sampler. This methodology enables parameter uncertainty to be incorporated into forecasts of seismicity. Estimation and forecasting are demonstrated on simulated catalogs and an earthquake catalog from Italy. <span>R</span> code implementing the methods is provided in the Supplementary Materials.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108219"},"PeriodicalIF":1.5,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144261610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Small area prediction of counts under machine learning-type mixed models 机器学习混合模型下计数的小面积预测

IF 1.5 3区数学

Computational Statistics & Data Analysis Pub Date : 2025-05-30 DOI: 10.1016/j.csda.2025.108218

Nicolas Frink, Timo Schmid

{"title":"Small area prediction of counts under machine learning-type mixed models","authors":"Nicolas Frink, Timo Schmid","doi":"10.1016/j.csda.2025.108218","DOIUrl":"10.1016/j.csda.2025.108218","url":null,"abstract":"<div><div>Small area estimation methods are proposed that use generalized tree-based machine learning techniques to improve the estimation of disaggregated means in small areas using discrete survey data. Specifically, two existing approaches based on random forests - the Generalized Mixed Effects Random Forest (GMERF) and a Mixed Effects Random Forest (MERF) - are extended to accommodate count outcomes, addressing key challenges such as overdispersion. Additionally, three bootstrap methodologies designed to assess the reliability of point estimators for area-level means are evaluated. The numerical analysis shows that the MERF, which does not assume a Poisson distribution to model the mean behavior of count data, excels in scenarios of severe overdispersion. Conversely, the GMERF performs best under conditions where Poisson distribution assumptions are moderately met. In a case study using real-world data from the state of Guerrero, Mexico, the proposed methods effectively estimate area-level means while capturing the uncertainty inherent in overdispersed count data. These findings highlight their practical applicability for small area estimation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108218"},"PeriodicalIF":1.5,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144196139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0