Statistics and Computing最新文献

A Neural Network Integrated Accelerated Failure Time-Based Mixture Cure Model. 基于加速失效时间的神经网络混合固化模型。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-10-01 Epub Date: 2025-06-22 DOI: 10.1007/s11222-025-10674-y

Wisdom Aselisewine, Suvra Pal

引用次数: 0

Bootstrap estimation of the proportion of outliers in robust regression. 稳健回归中异常值比例的自举估计。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-02-01 Epub Date: 2024-11-16 DOI: 10.1007/s11222-024-10526-1

Qiang Heng, Kenneth Lange

引用次数: 0

Simulation based composite likelihood. 基于模拟的复合可能性。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-01-01 Epub Date: 2025-02-25 DOI: 10.1007/s11222-025-10584-z

Lorenzo Rimella, Chris Jewell, Paul Fearnhead

引用次数: 0

Sequential Bayesian Registration for Functional Data. 功能数据的顺序贝叶斯配准。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-01-01 Epub Date: 2025-05-27 DOI: 10.1007/s11222-025-10640-8

Yoonji Kim, Oksana A Chkrebtii, Sebastian A Kurtek

{"title":"Sequential Bayesian Registration for Functional Data.","authors":"Yoonji Kim, Oksana A Chkrebtii, Sebastian A Kurtek","doi":"10.1007/s11222-025-10640-8","DOIUrl":"10.1007/s11222-025-10640-8","url":null,"abstract":"In many modern applications, discretely-observed data may be naturally understood as a set of functions. Functional data often exhibit two confounded sources of variability: amplitude (y-axis) and phase (x-axis). The extraction of amplitude and phase, a process known as registration, is essential in exploring the underlying structure of functional data in a variety of areas, from environmental monitoring to medical imaging. Critically, such data are often gathered sequentially with new functional observations arriving over time. Despite this, existing registration procedures do not sequentially update inference based on the new data, requiring model refitting. To address these challenges, we introduce a Bayesian framework for sequential registration of functional data, which updates statistical inference as new sets of functions are assimilated. This Bayesian model-based sequential learning approach utilizes sequential Monte Carlo sampling to recursively update the alignment of observed functions while accounting for associated uncertainty. Distributed computing significantly reduces computational cost relative to refitting the model using an iterative method such as Markov chain Monte Carlo on the full data. Simulation studies and comparisons reveal that the proposed approach performs well even when the target posterior distribution has a challenging structure. We apply the proposed method to three real datasets: (1) functions of annual drought intensity near Kaweah River in California, (2) annual sea surface salinity functions near Null Island, and (3) a sequence of repeated patterns in electrocardiogram signals.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 4","pages":"108"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12116714/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144182656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Outcome-guided spike-and-slab Lasso Biclustering: A Novel Approach for Enhancing Biclustering Techniques for Gene Expression Analysis. 结果导向的穗板Lasso双聚类：一种增强基因表达分析双聚类技术的新方法。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-01-01 Epub Date: 2025-08-28 DOI: 10.1007/s11222-025-10709-4

Luis A Vargas-Mieles, Paul D W Kirk, Chris Wallace

{"title":"Outcome-guided spike-and-slab Lasso Biclustering: A Novel Approach for Enhancing Biclustering Techniques for Gene Expression Analysis.","authors":"Luis A Vargas-Mieles, Paul D W Kirk, Chris Wallace","doi":"10.1007/s11222-025-10709-4","DOIUrl":"10.1007/s11222-025-10709-4","url":null,"abstract":"Biclustering has gained interest in gene expression data analysis due to its ability to identify groups of samples that exhibit similar behaviour in specific subsets of genes (or vice versa), in contrast to traditional clustering methods that classify samples based on all genes. Despite advances, biclustering remains a challenging problem, even with cutting-edge methodologies. This paper introduces an extension of the recently proposed Spike-and-Slab Lasso Biclustering (SSLB) algorithm, termed Outcome-Guided SSLB (OG-SSLB), aimed at enhancing the identification of biclusters in gene expression analysis. Our proposed approach integrates disease outcomes into the biclustering framework through Bayesian profile regression. By leveraging additional clinical information, OG-SSLB improves the interpretability and relevance of the resulting biclusters. Comprehensive simulations and numerical experiments demonstrate that OG-SSLB achieves superior performance, with improved accuracy in estimating the number of clusters and higher consensus scores compared to the original SSLB method. Furthermore, OG-SSLB effectively identifies meaningful patterns and associations between gene expression profiles and disease states. These promising results demonstrate the effectiveness of OG-SSLB in advancing biclustering techniques, providing a powerful tool for uncovering biologically relevant insights. The OGSSLB software can be found as an R/C++ package at https://github.com/luisvargasmieles/OGSSLB.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 6","pages":"179"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12394340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144969714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extended fiducial inference for individual treatment effects via deep neural networks. 基于深度神经网络的个体治疗效果扩展基准推断。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-01-01 Epub Date: 2025-05-17 DOI: 10.1007/s11222-025-10624-8

Sehwan Kim, Faming Liang

{"title":"Extended fiducial inference for individual treatment effects via deep neural networks.","authors":"Sehwan Kim, Faming Liang","doi":"10.1007/s11222-025-10624-8","DOIUrl":"10.1007/s11222-025-10624-8","url":null,"abstract":"Individual treatment effect estimation has gained significant attention in recent data science literature. This work introduces the Double Neural Network (Double-NN) method to address this problem within the framework of extended fiducial inference (EFI). In the proposed method, deep neural networks are used to model the treatment and control effect functions, while an additional neural network is employed to estimate their parameters. The universal approximation capability of deep neural networks ensures the broad applicability of this method. Numerical results highlight the superior performance of the proposed Double-NN method compared to the conformal quantile regression (CQR) method in individual treatment effect estimation. From the perspective of statistical inference, this work advances the theory and methodology for statistical inference of large models. Specifically, it is theoretically proven that the proposed method permits the model size to increase with the sample size n at a rate of <math><mrow><mi>O</mi> <mo>(</mo> <msup><mi>n</mi> <mi>ζ</mi></msup> <mo>)</mo></mrow> </math> for some <math><mrow><mn>0</mn> <mo>≤</mo> <mi>ζ</mi> <mo><</mo> <mn>1</mn></mrow> </math> , while still maintaining proper quantification of uncertainty in the model parameters. This result marks a significant improvement compared to the range <math><mrow><mn>0</mn> <mo>≤</mo> <mi>ζ</mi> <mo><</mo> <mfrac><mn>1</mn> <mn>2</mn></mfrac> </mrow> </math> required by the classical central limit theorem. Furthermore, this work provides a rigorous framework for quantifying the uncertainty of deep neural networks under the neural scaling law, representing a substantial contribution to the statistical understanding of large-scale neural network models.Supplementary information: The online version contains supplementary material available at 10.1007/s11222-025-10624-8.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 4","pages":"97"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12085359/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144102739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian shared parameter joint models for heterogeneous populations. 异质种群的贝叶斯共享参数联合模型。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-01-01 Epub Date: 2025-06-12 DOI: 10.1007/s11222-025-10647-1

Sida Chen, Danilo Alvares, Marco Palma, Jessica K Barrett

{"title":"Bayesian shared parameter joint models for heterogeneous populations.","authors":"Sida Chen, Danilo Alvares, Marco Palma, Jessica K Barrett","doi":"10.1007/s11222-025-10647-1","DOIUrl":"10.1007/s11222-025-10647-1","url":null,"abstract":"Joint models (JMs) for longitudinal and time-to-event data are an important class of biostatistical models in health and medical research. When the study population consists of heterogeneous subgroups, standard JMs may be inadequate, leading to misleading results or loss of information. Joint latent class models (JLCMs) and their variants have been proposed to incorporate latent class structures into JMs. JLCMs are useful for identifying latent subgroups, uncovering deeper insights into relationships between the outcomes, and improving prediction performance. We consider the problem of Bayesian inference for the generic form of JLCMs, which poses significant computational challenges due to the complex nature of the posterior distribution. We propose a new Bayesian inference framework to tackle these challenges. Our approach leverages state-of-the-art Markov chain Monte Carlo techniques and parallel computing for parameter estimation and model selection regarding the number of latent classes. Through a simulation study, we demonstrate the feasibility and superiority of our proposed method over the existing approach. Additionally, we provide practical guidance on model and prior specification, which has received little attention, to facilitate the implementation of such complex models. We illustrate our method using data from the PAQUID prospective cohort study, where the outcomes of interest include a longitudinal measurement of cognitive performance and time to dementia diagnosis. Our analysis provides deeper insights into the latent class characteristics underlying the study population.Supplementary information: The online version contains supplementary material available at 10.1007/s11222-025-10647-1.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 5","pages":"125"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12162714/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144302837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online Bayesian changepoint detection for network Poisson processes with community structure. 具有群落结构的网络泊松过程的在线贝叶斯变化点检测。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-01-01 Epub Date: 2025-04-03 DOI: 10.1007/s11222-025-10606-w

Joshua Corneck, Edward A K Cohen, James S Martin, Francesco Sanna Passino

{"title":"Online Bayesian changepoint detection for network Poisson processes with community structure.","authors":"Joshua Corneck, Edward A K Cohen, James S Martin, Francesco Sanna Passino","doi":"10.1007/s11222-025-10606-w","DOIUrl":"10.1007/s11222-025-10606-w","url":null,"abstract":"Network point processes often exhibit latent structure that govern the behaviour of the sub-processes. It is not always reasonable to assume that this latent structure is static, and detecting when and how this driving structure changes is often of interest. In this paper, we introduce a novel online methodology for detecting changes within the latent structure of a network point process. We focus on block-homogeneous Poisson processes, where latent node memberships determine the rates of the edge processes. We propose a scalable variational procedure which can be applied on large networks in an online fashion via a Bayesian forgetting factor applied to sequential variational approximations to the posterior distribution. The proposed framework is tested on simulated and real-world data, and it rapidly and accurately detects changes to the latent edge process rates, and to the latent node group memberships, both in an online manner. In particular, in an application on the Santander Cycles bike-sharing network in central London, we detect changes within the network related to holiday periods and lockdown restrictions between 2019 and 2020.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 3","pages":"75"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11968509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using prior-data conflict to tune Bayesian regularized regression models. 利用先验数据冲突优化贝叶斯正则化回归模型。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-01-01 Epub Date: 2025-02-20 DOI: 10.1007/s11222-025-10582-1

Timofei Biziaev, Karen Kopciuk, Thierry Chekouo

{"title":"Using prior-data conflict to tune Bayesian regularized regression models.","authors":"Timofei Biziaev, Karen Kopciuk, Thierry Chekouo","doi":"10.1007/s11222-025-10582-1","DOIUrl":"10.1007/s11222-025-10582-1","url":null,"abstract":"In high-dimensional regression models, variable selection becomes challenging from a computational and theoretical perspective. Bayesian regularized regression via shrinkage priors like the Laplace or spike-and-slab prior are effective methods for variable selection in <math><mrow><mi>p</mi> <mo>></mo> <mi>n</mi></mrow> </math> scenarios provided the shrinkage priors are configured adequately. We propose an empirical Bayes configuration using checks for prior-data conflict: tests that assess whether there is disagreement in parameter information provided by the prior and data. We apply our proposed method to the Bayesian LASSO and spike-and-slab shrinkage priors in the linear regression model and assess the variable selection performance of our prior configurations through a high-dimensional simulation study. Additionally, we apply our method to proteomic data collected from patients admitted to the Albany Medical Center in Albany NY in April of 2020 with COVID-like respiratory issues. Simulation results suggest our proposed configurations may outperform competing models when the true regression effects are small.Supplementary information: The online version contains supplementary material available at 10.1007/s11222-025-10582-1.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 2","pages":"53"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11842445/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new p-value based multiple testing procedure for generalized linear models. 一种新的基于p值的广义线性模型多重检验方法。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-01-01 Epub Date: 2025-03-16 DOI: 10.1007/s11222-025-10600-2

Joseph Rilling, Cheng Yong Tang

{"title":"A new p-value based multiple testing procedure for generalized linear models.","authors":"Joseph Rilling, Cheng Yong Tang","doi":"10.1007/s11222-025-10600-2","DOIUrl":"10.1007/s11222-025-10600-2","url":null,"abstract":"This study introduces a novel p-value-based multiple testing approach tailored for generalized linear models. Despite the crucial role of generalized linear models in statistics, existing methodologies face obstacles arising from the heterogeneous variance of response variables and complex dependencies among estimated parameters. Our aim is to address the challenge of controlling the false discovery rate (FDR) amidst arbitrarily dependent test statistics. Through the development of efficient computational algorithms, we present a versatile statistical framework for multiple testing. The proposed framework accommodates a range of tools developed for constructing a new model matrix in regression-type analysis, including random row permutations and Model-X knockoffs. We devise efficient computing techniques to solve the encountered non-trivial quadratic matrix equations, enabling the construction of paired p-values suitable for the two-step multiple testing procedure proposed by Sarkar and Tang (Biometrika 109(4): 1149-1155, 2022). Theoretical analysis affirms the properties of our approach, demonstrating its capability to control the FDR at a given level. Empirical evaluations further substantiate its promising performance across diverse simulation settings.Supplementary information: The online version contains supplementary material available at 10.1007/s11222-025-10600-2.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 3","pages":"69"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11911269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143658683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0