Statistics and Computing最新文献_第7页

Bayesian cross-validation by parallel Markov chain Monte Carlo 通过并行马尔科夫链蒙特卡罗进行贝叶斯交叉验证

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-05-21 DOI: 10.1007/s11222-024-10404-w

Alex Cooper, Aki Vehtari, Catherine Forbes, Dan Simpson, Lauren Kennedy

{"title":"Bayesian cross-validation by parallel Markov chain Monte Carlo","authors":"Alex Cooper, Aki Vehtari, Catherine Forbes, Dan Simpson, Lauren Kennedy","doi":"10.1007/s11222-024-10404-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10404-w","url":null,"abstract":"Brute force cross-validation (CV) is a method for predictive assessment and model selection that is general and applicable to a wide range of Bayesian models. Naive or ‘brute force’ CV approaches are often too computationally costly for interactive modeling workflows, especially when inference relies on Markov chain Monte Carlo (MCMC). We propose overcoming this limitation using massively parallel MCMC. Using accelerator hardware such as graphics processor units, our approach can be about as fast (in wall clock time) as a single full-data model fit. Parallel CV is flexible because it can easily exploit a wide range data partitioning schemes, such as those designed for non-exchangeable data. It can also accommodate a range of scoring rules. We propose MCMC diagnostics, including a summary of MCMC mixing based on the popular potential scale reduction factor ((widehat{textrm{R}})) and MCMC effective sample size ((widehat{textrm{ESS}})) measures. We also describe a method for determining whether an (widehat{textrm{R}}) diagnostic indicates approximate stationarity of the chains, that may be of more general interest for applications beyond parallel CV. Finally, we show that parallel CV and its diagnostics can be implemented with online algorithms, allowing parallel CV to scale up to very large blocking designs on memory-constrained computing accelerators.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"60 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141150196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spike and slab Bayesian sparse principal component analysis 尖峰和板块贝叶斯稀疏主成分分析

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-05-13 DOI: 10.1007/s11222-024-10430-8

Yu-Chien Bo Ning, Ning Ning

{"title":"Spike and slab Bayesian sparse principal component analysis","authors":"Yu-Chien Bo Ning, Ning Ning","doi":"10.1007/s11222-024-10430-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10430-8","url":null,"abstract":"Sparse principal component analysis (SPCA) is a popular tool for dimensionality reduction in high-dimensional data. However, there is still a lack of theoretically justified Bayesian SPCA methods that can scale well computationally. One of the major challenges in Bayesian SPCA is selecting an appropriate prior for the loadings matrix, considering that principal components are mutually orthogonal. We propose a novel parameter-expanded coordinate ascent variational inference (PX-CAVI) algorithm. This algorithm utilizes a spike and slab prior, which incorporates parameter expansion to cope with the orthogonality constraint. Besides comparing to two popular SPCA approaches, we introduce the PX-EM algorithm as an EM analogue to the PX-CAVI algorithm for comparison. Through extensive numerical simulations, we demonstrate that the PX-CAVI algorithm outperforms these SPCA approaches, showcasing its superiority in terms of performance. We study the posterior contraction rate of the variational posterior, providing a novel contribution to the existing literature. The PX-CAVI algorithm is then applied to study a lung cancer gene expression dataset. The (textsf{R}) package (textsf{VBsparsePCA}) with an implementation of the algorithm is available on the Comprehensive R Archive Network (CRAN).","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"47 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A general model-checking procedure for semiparametric accelerated failure time models 半参数加速故障时间模型的一般模型检查程序

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-05-07 DOI: 10.1007/s11222-024-10431-7

Dongrak Choi, Woojung Bae, Jun Yan, Sangwook Kang

{"title":"A general model-checking procedure for semiparametric accelerated failure time models","authors":"Dongrak Choi, Woojung Bae, Jun Yan, Sangwook Kang","doi":"10.1007/s11222-024-10431-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10431-7","url":null,"abstract":"We propose a set of goodness-of-fit tests for the semiparametric accelerated failure time (AFT) model, including an omnibus test, a link function test, and a functional form test. This set of tests is derived from a multi-parameter cumulative sum process shown to follow asymptotically a zero-mean Gaussian process. Its evaluation is based on the asymptotically equivalent perturbed version, which enables both graphical and numerical evaluations of the assumed AFT model. Empirical p-values are obtained using the Kolmogorov-type supremum test, which provides a reliable approach for estimating the significance of both proposed un-standardized and standardized test statistics. The proposed procedure is illustrated using the rank-based estimator but is general in the sense that it is directly applicable to some other popular estimators such as induced smoothed rank-based estimator or least-squares estimator that satisfies certain properties. Our proposed methods are rigorously evaluated using extensive simulation experiments that demonstrate their effectiveness in maintaining a Type I error rate and detecting departures from the assumed AFT model in practical sample sizes and censoring rates. Furthermore, the proposed approach is applied to the analysis of the Primary Biliary Cirrhosis data, a widely studied dataset in survival analysis, providing further evidence of the practical usefulness of the proposed methods in real-world scenarios. To make the proposed methods more accessible to researchers, we have implemented them in the R package afttest, which is publicly available on the Comprehensive R Archive Network.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"18 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A flexible Bayesian tool for CoDa mixed models: logistic-normal distribution with Dirichlet covariance 用于 CoDa 混合模型的灵活贝叶斯工具：具有 Dirichlet 协方差的逻辑正态分布

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-04-16 DOI: 10.1007/s11222-024-10427-3

Joaquín Martínez-Minaya, Haavard Rue

{"title":"A flexible Bayesian tool for CoDa mixed models: logistic-normal distribution with Dirichlet covariance","authors":"Joaquín Martínez-Minaya, Haavard Rue","doi":"10.1007/s11222-024-10427-3","DOIUrl":"https://doi.org/10.1007/s11222-024-10427-3","url":null,"abstract":"Compositional Data Analysis (CoDa) has gained popularity in recent years. This type of data consists of values from disjoint categories that sum up to a constant. Both Dirichlet regression and logistic-normal regression have become popular as CoDa analysis methods. However, fitting this kind of multivariate models presents challenges, especially when structured random effects are included in the model, such as temporal or spatial effects. To overcome these challenges, we propose the logistic-normal Dirichlet Model (LNDM). We seamlessly incorporate this approach into the R-INLA package, facilitating model fitting and model prediction within the framework of Latent Gaussian Models. Moreover, we explore metrics like Deviance Information Criteria, Watanabe Akaike information criterion, and cross-validation measure conditional predictive ordinate for model selection in R-INLA for CoDa. Illustrating LNDM through two simulated examples and with an ecological case study on Arabidopsis thaliana in the Iberian Peninsula, we underscore its potential as an effective tool for managing CoDa and large CoDa databases.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"128 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140613645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A communication-efficient, online changepoint detection method for monitoring distributed sensor networks 用于监测分布式传感器网络的通信效率高的在线变化点检测方法

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-04-14 DOI: 10.1007/s11222-024-10428-2

Ziyang Yang, Idris A. Eckley, Paul Fearnhead

引用次数: 0

Parsimonious consensus hierarchies, partitions and fuzzy partitioning of a set of hierarchies 一组层次结构的准共识层次结构、分区和模糊分区

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-04-12 DOI: 10.1007/s11222-024-10415-7

Ilaria Bombelli, Maurizio Vichi

引用次数: 0

Reversed particle filtering for hidden markov models 隐马尔可夫模型的反向粒子滤波

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-04-08 DOI: 10.1007/s11222-024-10426-4

Frank Rotiroti, Stephen G. Walker

{"title":"Reversed particle filtering for hidden markov models","authors":"Frank Rotiroti, Stephen G. Walker","doi":"10.1007/s11222-024-10426-4","DOIUrl":"https://doi.org/10.1007/s11222-024-10426-4","url":null,"abstract":"We present an approach to selecting the distributions in sampling-resampling which improves the efficiency of the weighted bootstrap. To complement the standard scheme of sampling from the prior and reweighting with the likelihood, we introduce a reversed scheme, which samples from the (normalized) likelihood and reweights with the prior. We begin with some motivating examples, before developing the relevant theory. We then apply the approach to the particle filtering of time series, including nonlinear and non-Gaussian Bayesian state-space models, a task that demands efficiency, given the repeated application of the weighted bootstrap. Through simulation studies on a normal dynamic linear model, Poisson hidden Markov model, and stochastic volatility model, we demonstrate the gains in efficiency obtained by the approach, involving the choice of the standard or reversed filter. In addition, for the stochastic volatility model, we provide three real-data examples, including a comparison with importance sampling methods that attempt to incorporate information about the data indirectly into the standard filtering scheme and an extension to multivariate models. We determine that the reversed filtering scheme offers an advantage over such auxiliary methods owing to its ability to incorporate information about the data directly into the sampling, an ability that further facilitates its performance in higher-dimensional settings.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"2015 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction to: Bayesian high-dimensional covariate selection in non-linear mixed-effects models using the SAEM algorithm 更正：使用 SAEM 算法在非线性混合效应模型中进行贝叶斯高维协变量选择

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-04-08 DOI: 10.1007/s11222-024-10421-9

Marion Naveau, Guillaume Kon Kam King, Renaud Rincent, Laure Sansonnet, Maud Delattre

引用次数: 0

Screen then select: a strategy for correlated predictors in high-dimensional quantile regression 先筛选后选择：高维量子回归中相关预测因子的策略

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-04-08 DOI: 10.1007/s11222-024-10424-6

Xuejun Jiang, Yakun Liang, Haofeng Wang

{"title":"Screen then select: a strategy for correlated predictors in high-dimensional quantile regression","authors":"Xuejun Jiang, Yakun Liang, Haofeng Wang","doi":"10.1007/s11222-024-10424-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10424-6","url":null,"abstract":"Strong correlation among predictors and heavy-tailed noises pose a great challenge in the analysis of ultra-high dimensional data. Such challenge leads to an increase in the computation time for discovering active variables and a decrease in selection accuracy. To address this issue, we propose an innovative two-stage screen-then-select approach and its derivative procedure based on a robust quantile regression with sparsity assumption. This approach initially screens important features by ranking quantile ridge estimation and subsequently employs a likelihood-based post-screening selection strategy to refine variable selection. Additionally, we conduct an internal competition mechanism along the greedy search path to enhance the robustness of algorithm against the design dependence. Our methods are simple to implement and possess numerous desirable properties from theoretical and computational standpoints. Theoretically, we establish the strong consistency of feature selection for the proposed methods under some regularity conditions. In empirical studies, we assess the finite sample performance of our methods by comparing them with utility screening approaches and existing penalized quantile regression methods. Furthermore, we apply our methods to identify genes associated with anticancer drug sensitivities for practical guidance.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"53 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

R-VGAL: a sequential variational Bayes algorithm for generalised linear mixed models R-VGAL：广义线性混合模型的顺序变异贝叶斯算法

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-04-06 DOI: 10.1007/s11222-024-10422-8

Bao Anh Vu, David Gunawan, Andrew Zammit-Mangion

{"title":"R-VGAL: a sequential variational Bayes algorithm for generalised linear mixed models","authors":"Bao Anh Vu, David Gunawan, Andrew Zammit-Mangion","doi":"10.1007/s11222-024-10422-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10422-8","url":null,"abstract":"Models with random effects, such as generalised linear mixed models (GLMMs), are often used for analysing clustered data. Parameter inference with these models is difficult because of the presence of cluster-specific random effects, which must be integrated out when evaluating the likelihood function. Here, we propose a sequential variational Bayes algorithm, called Recursive Variational Gaussian Approximation for Latent variable models (R-VGAL), for estimating parameters in GLMMs. The R-VGAL algorithm operates on the data sequentially, requires only a single pass through the data, and can provide parameter updates as new data are collected without the need of re-processing the previous data. At each update, the R-VGAL algorithm requires the gradient and Hessian of a “partial” log-likelihood function evaluated at the new observation, which are generally not available in closed form for GLMMs. To circumvent this issue, we propose using an importance-sampling-based approach for estimating the gradient and Hessian via Fisher’s and Louis’ identities. We find that R-VGAL can be unstable when traversing the first few data points, but that this issue can be mitigated by introducing a damping factor in the initial steps of the algorithm. Through illustrations on both simulated and real datasets, we show that R-VGAL provides good approximations to posterior distributions, that it can be made robust through damping, and that it is computationally efficient.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0