Statistics and Computing最新文献_第5页

Greedy recursive spectral bisection for modularity-bound hierarchical divisive community detection 用于模块化约束分层分裂群落检测的贪婪递归光谱分段法

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-06-27 DOI: 10.1007/s11222-024-10451-3

Douglas O. Cardoso, João Domingos Gomes da Silva Junior, Carla Silva Oliveira, Celso Marques, Laura Silva de Assis

{"title":"Greedy recursive spectral bisection for modularity-bound hierarchical divisive community detection","authors":"Douglas O. Cardoso, João Domingos Gomes da Silva Junior, Carla Silva Oliveira, Celso Marques, Laura Silva de Assis","doi":"10.1007/s11222-024-10451-3","DOIUrl":"https://doi.org/10.1007/s11222-024-10451-3","url":null,"abstract":"Spectral clustering techniques depend on the eigenstructure of a similarity matrix to assign data points to clusters, so that points within the same cluster exhibit high similarity and are compared to those in different clusters. This work aimed to develop a spectral method that could be compared to clustering algorithms that represent the current state of the art. This investigation conceived a novel spectral clustering method, as well as five policies that guide its execution, based on spectral graph theory and embodying hierarchical clustering principles. Computational experiments comparing the proposed method with six state-of-the-art algorithms were undertaken in this study to evaluate the clustering methods under scrutiny. The assessment was performed using two evaluation metrics, specifically the adjusted Rand index, and modularity. The obtained results furnish compelling evidence, indicating that the proposed method is competitive and possesses distinctive properties compared to those elucidated in the existing literature. This suggests that our approach stands as a viable alternative, offering a robust choice within the spectrum of available same-purpose tools.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"11 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Flexible Bayesian quantile regression based on the generalized asymmetric Huberised-type distribution 基于广义非对称胡贝利兹型分布的灵活贝叶斯量化回归

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-06-27 DOI: 10.1007/s11222-024-10453-1

Weitao Hu, Weiping Zhang

引用次数: 0

Structured prior distributions for the covariance matrix in latent factor models 潜因模型中协方差矩阵的结构化先验分布

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-06-26 DOI: 10.1007/s11222-024-10454-0

Sarah Elizabeth Heaps, Ian Hyla Jermyn

{"title":"Structured prior distributions for the covariance matrix in latent factor models","authors":"Sarah Elizabeth Heaps, Ian Hyla Jermyn","doi":"10.1007/s11222-024-10454-0","DOIUrl":"https://doi.org/10.1007/s11222-024-10454-0","url":null,"abstract":"Factor models are widely used for dimension reduction in the analysis of multivariate data. This is achieved through decomposition of a (p times p) covariance matrix into the sum of two components. Through a latent factor representation, they can be interpreted as a diagonal matrix of idiosyncratic variances and a shared variation matrix, that is, the product of a (p times k) factor loadings matrix and its transpose. If (k ll p), this defines a parsimonious factorisation of the covariance matrix. Historically, little attention has been paid to incorporating prior information in Bayesian analyses using factor models where, at best, the prior for the factor loadings is order invariant. In this work, a class of structured priors is developed that can encode ideas of dependence structure about the shared variation matrix. The construction allows data-informed shrinkage towards sensible parametric structures while also facilitating inference over the number of factors. Using an unconstrained reparameterisation of stationary vector autoregressions, the methodology is extended to stationary dynamic factor models. For computational inference, parameter-expanded Markov chain Monte Carlo samplers are proposed, including an efficient adaptive Gibbs sampler. Two substantive applications showcase the scope of the methodology and its inferential benefits.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"548 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Birnbaum–Saunders frailty regression models for clustered survival data 用于聚类生存数据的 Birnbaum-Saunders 虚弱回归模型

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-06-25 DOI: 10.1007/s11222-024-10458-w

Diego I. Gallardo, Marcelo Bourguignon, José S. Romeo

{"title":"Birnbaum–Saunders frailty regression models for clustered survival data","authors":"Diego I. Gallardo, Marcelo Bourguignon, José S. Romeo","doi":"10.1007/s11222-024-10458-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10458-w","url":null,"abstract":"We present a novel frailty model for modeling clustered survival data. In particular, we consider the Birnbaum–Saunders (BS) distribution for the frailty terms with a new directly parameterized on the variance of the frailty distribution. This allows, among other things, compare the estimated frailty terms among traditional models, such as the gamma frailty model. Some mathematical properties of the new model are studied including the conditional distribution of frailties among the survivors, the frailty of individuals dying at time t, and the Kendall’s (tau ) measure. Furthermore, an explicit form to the derivatives of the Laplace transform for the BS distribution using the di Bruno’s formula is found. Parametric, non-parametric and semiparametric versions of the BS frailty model are studied. We use a simple Expectation-Maximization (EM) algorithm to estimate the model parameters and evaluate its performance under different censoring proportion by a Monte Carlo simulation study. We also show that the BS frailty model is competitive over the gamma and weighted Lindley frailty models under misspecification. We illustrate our methodology by using a real data sets.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"148 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A review on the Adaptive-Ridge Algorithm with several extensions 自适应脊算法及若干扩展功能综述

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-06-25 DOI: 10.1007/s11222-024-10440-6

Rémy Abergel, Olivier Bouaziz, Grégory Nuel

{"title":"A review on the Adaptive-Ridge Algorithm with several extensions","authors":"Rémy Abergel, Olivier Bouaziz, Grégory Nuel","doi":"10.1007/s11222-024-10440-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10440-6","url":null,"abstract":"The Adaptive Ridge Algorithm is an iterative algorithm designed for variable selection. It is also known under the denomination of Iteratively Reweighted Least-Squares Algorithm in the communities of Compressed Sensing and Sparse Signals Recovery. Besides, it can also be interpreted as an optimization algorithm dedicated to the minimization of possibly nonconvex (ell ^q) penalized energies (with (0<q<2)). In the literature, this algorithm can be derived using various mathematical approaches, namely Half Quadratic Minimization, Majorization-Minimization, Alternating Minimization or Local Approximations. In this work, we will show how the Adaptive Ridge Algorithm can be simply derived and analyzed from a single equation, corresponding to a variational reformulation of the (ell ^q) penalty. We will describe in detail how the Adaptive Ridge Algorithm can be numerically implemented and we will perform a thorough experimental study of its parameters. We will also show how the variational formulation of the (ell ^q) penalty combined with modern duality principles can be used to design an interesting variant of the Adaptive Ridge Algorithm dedicated to the minimization of quadratic functions over (nonconvex) (ell ^q) balls.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"26 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing cure rate analysis through integration of machine learning models: a comparative study 通过整合机器学习模型加强治愈率分析：一项比较研究

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-06-25 DOI: 10.1007/s11222-024-10456-y

Wisdom Aselisewine, Suvra Pal

{"title":"Enhancing cure rate analysis through integration of machine learning models: a comparative study","authors":"Wisdom Aselisewine, Suvra Pal","doi":"10.1007/s11222-024-10456-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10456-y","url":null,"abstract":"Cure rate models have been thoroughly investigated across various domains, encompassing medicine, reliability, and finance. The merging of machine learning (ML) with cure models is emerging as a promising strategy to improve predictive accuracy and gain profound insights into the underlying mechanisms influencing the probability of cure. The current body of literature has explored the benefits of incorporating a single ML algorithm with cure models. However, there is a notable absence of a comprehensive study that compares the performances of various ML algorithms in this context. This paper seeks to address and bridge this gap. Specifically, we focus on the well-known mixture cure model and examine the incorporation of five distinct ML algorithms: extreme gradient boosting, neural networks, support vector machines, random forests, and decision trees. To bolster the robustness of our comparison, we also include cure models with logistic and spline-based regression. For parameter estimation, we formulate an expectation maximization algorithm. A comprehensive simulation study is conducted across diverse scenarios to compare various models based on the accuracy and precision of estimates for different quantities of interest, along with the predictive accuracy of cure. The results derived from both the simulation study, as well as the analysis of real cutaneous melanoma data, indicate that the incorporation of ML models into cure model provides a beneficial contribution to the ongoing endeavors aimed at improving the accuracy of cure rate estimation.\u0000","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"111 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gaussian processes for Bayesian inverse problems associated with linear partial differential equations 与线性偏微分方程相关的贝叶斯逆问题的高斯过程

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-06-24 DOI: 10.1007/s11222-024-10452-2

Tianming Bai, Aretha L. Teckentrup, Konstantinos C. Zygalakis

引用次数: 0

Bounded-memory adjusted scores estimation in generalized linear models with large data sets 具有大型数据集的广义线性模型中的限界内存调整分数估计

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-06-21 DOI: 10.1007/s11222-024-10447-z

Patrick Zietkiewicz, Ioannis Kosmidis

{"title":"Bounded-memory adjusted scores estimation in generalized linear models with large data sets","authors":"Patrick Zietkiewicz, Ioannis Kosmidis","doi":"10.1007/s11222-024-10447-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10447-z","url":null,"abstract":"The widespread use of maximum Jeffreys’-prior penalized likelihood in binomial-response generalized linear models, and in logistic regression, in particular, are supported by the results of Kosmidis and Firth (Biometrika 108:71–82, 2021. https://doi.org/10.1093/biomet/asaa052), who show that the resulting estimates are always finite-valued, even in cases where the maximum likelihood estimates are not, which is a practical issue regardless of the size of the data set. In logistic regression, the implied adjusted score equations are formally bias-reducing in asymptotic frameworks with a fixed number of parameters and appear to deliver a substantial reduction in the persistent bias of the maximum likelihood estimator in high-dimensional settings where the number of parameters grows asymptotically as a proportion of the number of observations. In this work, we develop and present two new variants of iteratively reweighted least squares for estimating generalized linear models with adjusted score equations for mean bias reduction and maximization of the likelihood penalized by a positive power of the Jeffreys-prior penalty, which eliminate the requirement of storing O(n) quantities in memory, and can operate with data sets that exceed computer memory or even hard drive capacity. We achieve that through incremental QR decompositions, which enable IWLS iterations to have access only to data chunks of predetermined size. Both procedures can also be readily adapted to fit generalized linear models when distinct parts of the data is stored across different sites and, due to privacy concerns, cannot be fully transferred across sites. We assess the procedures through a real-data application with millions of observations.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"16 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An efficient workflow for modelling high-dimensional spatial extremes 建立高维空间极值模型的高效工作流程

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-06-19 DOI: 10.1007/s11222-024-10448-y

Silius M. Vandeskog, Sara Martino, Raphaël Huser

引用次数: 0

Model-based clustering with missing not at random data 基于模型的非随机数据缺失聚类

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-06-18 DOI: 10.1007/s11222-024-10444-2

Aude Sportisse, Matthieu Marbac, Fabien Laporte, Gilles Celeux, Claire Boyer, Julie Josse, Christophe Biernacki

{"title":"Model-based clustering with missing not at random data","authors":"Aude Sportisse, Matthieu Marbac, Fabien Laporte, Gilles Celeux, Claire Boyer, Julie Josse, Christophe Biernacki","doi":"10.1007/s11222-024-10444-2","DOIUrl":"https://doi.org/10.1007/s11222-024-10444-2","url":null,"abstract":"Model-based unsupervised learning, as any learning task, stalls as soon as missing data occurs. This is even more true when the missing data are informative, or said missing not at random (MNAR). In this paper, we propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data. To do so, we introduce a mixture model for different types of data (continuous, count, categorical and mixed) to jointly model the data distribution and the MNAR mechanism, remaining vigilant to the relative degrees of freedom of each. Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership. However, we focus on a specific MNAR model, called MNARz, for which the missingness only depends on the class membership. We first underline its ease of estimation, by showing that the statistical inference can be carried out on the data matrix concatenated with the missing mask considering finally a standard MAR mechanism. Consequently, we propose to perform clustering using the Expectation Maximization algorithm, specially developed for this simplified reinterpretation. Finally, we assess the numerical performances of the proposed methods on synthetic data and on the real medical registry TraumaBase as well.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"46 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0