Statistics and Computing最新文献_第2页

Estimation and model selection for finite mixtures of Tukey's g- &-h distributions. Tukey的g- &-h分布的有限混合估计和模型选择。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-01-01 Epub Date: 2025-03-15 DOI: 10.1007/s11222-025-10596-9

Tingting Zhan, Misung Yi, Amy R Peck, Hallgeir Rui, Inna Chervoneva

{"title":"Estimation and model selection for finite mixtures of Tukey's g- &-h distributions.","authors":"Tingting Zhan, Misung Yi, Amy R Peck, Hallgeir Rui, Inna Chervoneva","doi":"10.1007/s11222-025-10596-9","DOIUrl":"10.1007/s11222-025-10596-9","url":null,"abstract":"A finite mixture of distributions is a popular statistical model, which is especially meaningful when the population of interest may include distinct subpopulations. This work is motivated by analysis of protein expression levels quantified using immunofluorescence immunohistochemistry assays of human tissues. The distributions of cellular protein expression levels in a tissue often exhibit multimodality, skewness and heavy tails, but there is a substantial variability between distributions in different tissues from different subjects, while some of these mixture distributions include components consistent with the assumption of a normal distribution. To accommodate such diversity, we propose a mixture of 4-parameter Tukey's g- &-h distributions for fitting finite mixtures with both Gaussian and non-Gaussian components. Tukey's g- &-h distribution is a flexible model that allows variable degree of skewness and kurtosis in mixture components, including normal distribution as a particular case. Since the likelihood of the Tukey's g- &-h mixtures does not have a closed analytical form, we propose a quantile least Mahalanobis distance (QLMD) estimator for parameters of such mixtures. QLMD is an indirect estimator minimizing the Mahalanobis distance between the sample and model-based quantiles, and its asymptotic properties follow from the general theory of indirect estimation. We have developed a stepwise algorithm to select a parsimonious Tukey's g- &-h mixture model and implemented all proposed methods in the R package QuantileGH available on CRAN. A simulation study was conducted to evaluate performance of the Tukey's g- &-h mixtures and compare to performance of mixtures of skew-normal or skew-t distributions. The Tukey's g- &-h mixtures were applied to model cellular expressions of Cyclin D1 protein in breast cancer tissues, and resulting parameter estimates evaluated as predictors of progression-free survival.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 3","pages":"67"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11910465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143650810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian additive tree ensembles for composite quantile regressions. 复合分位数回归的贝叶斯加性树集成。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-01-01 Epub Date: 2025-08-26 DOI: 10.1007/s11222-025-10711-w

Yaeji Lim, Ruijin Lu, Madeleine St Ville, Zhen Chen

{"title":"Bayesian additive tree ensembles for composite quantile regressions.","authors":"Yaeji Lim, Ruijin Lu, Madeleine St Ville, Zhen Chen","doi":"10.1007/s11222-025-10711-w","DOIUrl":"https://doi.org/10.1007/s11222-025-10711-w","url":null,"abstract":"In this paper, we introduce a novel approach that integrates Bayesian additive regression trees (BART) with the composite quantile regression (CQR) framework, creating a robust method for modeling complex relationships between predictors and outcomes under various error distributions. Unlike traditional quantile regression, which focuses on specific quantile levels, our proposed method, composite quantile BART, offers greater flexibility in capturing the entire conditional distribution of the response variable. By leveraging the strengths of BART and CQR, the proposed method provides enhanced predictive performance, especially in the presence of heavy-tailed errors and non-linear covariate effects. Numerical studies confirm that the proposed composite quantile BART method generally outperforms classical BART, quantile BART, and composite quantile linear regression models in terms of RMSE, especially under heavy-tailed or contaminated error distributions. Notably, under contaminated normal errors, it reduces RMSE by approximately 17% compared to composite quantile regression, and by 27% compared to classical BART.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 6","pages":"175"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12380950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144969678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

funBIalign: a hierachical algorithm for functional motif discovery based on mean squared residue scores. funBIalign：基于平均残基平方得分的功能主题发现分层算法。

IF 1.6 2区数学

Statistics and Computing Pub Date : 2025-01-01 Epub Date: 2024-12-10 DOI: 10.1007/s11222-024-10537-y

Jacopo Di Iorio, Marzia A Cremona, Francesca Chiaromonte

{"title":"funBIalign: a hierachical algorithm for functional motif discovery based on mean squared residue scores.","authors":"Jacopo Di Iorio, Marzia A Cremona, Francesca Chiaromonte","doi":"10.1007/s11222-024-10537-y","DOIUrl":"10.1007/s11222-024-10537-y","url":null,"abstract":"Motif discovery is gaining increasing attention in the domain of functional data analysis. Functional motifs are typical \"shapes\" or \"patterns\" that recur multiple times in different portions of a single curve and/or in misaligned portions of multiple curves. In this paper, we define functional motifs using an additive model and we propose funBIalign for their discovery and evaluation. Inspired by clustering and biclustering techniques, funBIalign is a multi-step procedure which uses agglomerative hierarchical clustering with complete linkage and a functional distance based on mean squared residue scores to discover functional motifs, both in a single curve (e.g., time series) and in a set of curves. We assess its performance and compare it to other recent methods through extensive simulations. Moreover, we use funBIalign for discovering motifs in two real-data case studies; one on food price inflation and one on temperature changes.Supplementary information: The online version contains supplementary material available at 10.1007/s11222-024-10537-y.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 1","pages":"11"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hidden Markov models for multivariate panel data 多元面板数据的隐马尔可夫模型

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-09-18 DOI: 10.1007/s11222-024-10462-0

Mackenzie R. Neal, Alexa A. Sochaniwsky, Paul D. McNicholas

引用次数: 0

Accelerated failure time models with error-prone response and nonlinear covariates 具有易出错响应和非线性协变量的加速故障时间模型

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-09-18 DOI: 10.1007/s11222-024-10491-9

Li-Pang Chen

{"title":"Accelerated failure time models with error-prone response and nonlinear covariates","authors":"Li-Pang Chen","doi":"10.1007/s11222-024-10491-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10491-9","url":null,"abstract":"As a specific application of survival analysis, one of main interests in medical studies aims to analyze the patients’ survival time of a specific cancer. Typically, gene expressions are treated as covariates to characterize the survival time. In the framework of survival analysis, the accelerated failure time model in the parametric form is perhaps a common approach. However, gene expressions are possibly nonlinear and the survival time as well as censoring status are subject to measurement error. In this paper, we aim to tackle those complex features simultaneously. We first correct for measurement error in survival time and censoring status, and use them to develop a corrected Buckley–James estimator. After that, we use the boosting algorithm with the cubic spline estimation method to iteratively recover nonlinear relationship between covariates and survival time. Theoretically, we justify the validity of measurement error correction and estimation procedure. Numerical studies show that the proposed method improves the performance of estimation and is able to capture informative covariates. The methodology is primarily used to analyze the breast cancer data provided by the Netherlands Cancer Institute for research.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sequential model identification with reversible jump ensemble data assimilation method 采用可逆跃迁集合数据同化方法进行序列模型识别

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-09-18 DOI: 10.1007/s11222-024-10499-1

Yue Huan, Hai Xiang Lin

引用次数: 0

Shrinkage for extreme partial least-squares 极端部分最小二乘法的收缩

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-09-17 DOI: 10.1007/s11222-024-10490-w

Julyan Arbel, Stéphane Girard, Hadrien Lorenzo

{"title":"Shrinkage for extreme partial least-squares","authors":"Julyan Arbel, Stéphane Girard, Hadrien Lorenzo","doi":"10.1007/s11222-024-10490-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10490-w","url":null,"abstract":"This work focuses on dimension-reduction techniques for modelling conditional extreme values. Specifically, we investigate the idea that extreme values of a response variable can be explained by nonlinear functions derived from linear projections of an input random vector. In this context, the estimation of projection directions is examined, as approached by the extreme partial least squares (EPLS) method—an adaptation of the original partial least squares (PLS) method tailored to the extreme-value framework. Further, a novel interpretation of EPLS directions as maximum likelihood estimators is introduced, utilizing the von Mises–Fisher distribution applied to hyperballs. The dimension reduction process is enhanced through the Bayesian paradigm, enabling the incorporation of prior information into the projection direction estimation. The maximum a posteriori estimator is derived in two specific cases, elucidating it as a regularization or shrinkage of the EPLS estimator. We also establish its asymptotic behavior as the sample size approaches infinity. A simulation data study is conducted in order to assess the practical utility of our proposed method. This clearly demonstrates its effectiveness even in moderate data problems within high-dimensional settings. Furthermore, we provide an illustrative example of the method’s applicability using French farm income data, highlighting its efficacy in real-world scenarios.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"205 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nonconvex Dantzig selector and its parallel computing algorithm 非凸丹齐格选择器及其并行计算算法

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-09-16 DOI: 10.1007/s11222-024-10492-8

Jiawei Wen, Songshan Yang, Delin Zhao

{"title":"Nonconvex Dantzig selector and its parallel computing algorithm","authors":"Jiawei Wen, Songshan Yang, Delin Zhao","doi":"10.1007/s11222-024-10492-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10492-8","url":null,"abstract":"The Dantzig selector is a popular (ell _1)-type variable selection method widely used across various research fields. However, (ell _1)-type methods may not perform well for variable selection without complex irrepresentable conditions. In this article, we introduce a nonconvex Dantzig selector for ultrahigh-dimensional linear models. We begin by demonstrating that the oracle estimator serves as a local optimum for the nonconvex Dantzig selector. In addition, we propose a one-step local linear approximation estimator, called the Dantzig-LLA estimator, for the nonconvex Dantzig selector, and establish its strong oracle property. The proposed regularization method avoids the restrictive conditions imposed by (ell _1) regularization methods to guarantee the model selection consistency. Furthermore, we propose an efficient and parallelizable computing algorithm based on feature-splitting to address the computational challenges associated with the nonconvex Dantzig selector in high-dimensional settings. A comprehensive numerical study is conducted to evaluate the performance of the nonconvex Dantzig selector and the computing efficiency of the feature-splitting algorithm. The results demonstrate that the Dantzig selector with nonconvex penalty outperforms the (ell _1) penalty-based selector, and the feature-splitting algorithm performs well in high-dimensional settings where linear programming solver may fail. Finally, we generalize the concept of nonconvex Dantzig selector to deal with more general loss functions.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"1 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust singular value decomposition with application to video surveillance background modelling 将鲁棒奇异值分解应用于视频监控背景建模

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-09-11 DOI: 10.1007/s11222-024-10493-7

Subhrajyoty Roy, Abhik Ghosh, Ayanendranath Basu

{"title":"Robust singular value decomposition with application to video surveillance background modelling","authors":"Subhrajyoty Roy, Abhik Ghosh, Ayanendranath Basu","doi":"10.1007/s11222-024-10493-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10493-7","url":null,"abstract":"The traditional method of computing singular value decomposition (SVD) of a data matrix is based on the least squares principle and is, therefore, very sensitive to the presence of outliers. Hence, the resulting inferences across different applications using the classical SVD are extremely degraded in the presence of data contamination. In particular, background modelling of video surveillance data in the presence of camera tampering cannot be reliably solved by the classical SVD. In this paper, we propose a novel robust singular value decomposition technique based on the popular minimum density power divergence estimator. We have established the theoretical properties of the proposed estimator such as convergence, equivariance and consistency under the high-dimensional regime where both the row and column dimensions of the data matrix approach infinity. We also propose a fast and scalable algorithm based on alternating weighted regression to obtain the estimate. Within the scope of our fairly extensive simulation studies, our method performs better than existing robust SVD algorithms. Finally, we present an application of the proposed method on the video surveillance background modelling problem.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"39 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal confidence interval for the difference between proportions 比例差异的最佳置信区间

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-09-02 DOI: 10.1007/s11222-024-10485-7

Almog Peer, David Azriel

{"title":"Optimal confidence interval for the difference between proportions","authors":"Almog Peer, David Azriel","doi":"10.1007/s11222-024-10485-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10485-7","url":null,"abstract":"Estimating the probability of the binomial distribution is a basic problem, which appears in almost all introductory statistics courses and is performed frequently in various studies. In some cases, the parameter of interest is a difference between two probabilities, and the current work studies the construction of confidence intervals for this parameter when the sample size is small. Our goal is to find the shortest confidence intervals under the constraint of coverage probability being at least as large as a predetermined level. For the two-sample case, there is no known algorithm that achieves this goal, but different heuristics procedures have been suggested, and the present work aims at finding optimal confidence intervals. In the one-sample case, there is a known algorithm that finds optimal confidence intervals presented by Blyth and Still (J Am Stat Assoc 78(381):108–116, 1983). It is based on solving small and local optimization problems and then using an inversion step to find the global optimum solution. We show that this approach fails in the two-sample case and therefore, in order to find optimal confidence intervals, one needs to solve a global optimization problem, rather than small and local ones, which is computationally much harder. We present and discuss the suitable global optimization problem. Using the Gurobi package we find near-optimal solutions when the sample sizes are smaller than 15, and we compare these solutions to some existing methods, both approximate and exact. We find that the improvement in terms of lengths with respect to the best competitor varies between 1.5 and 5% for different parameters of the problem. Therefore, we recommend the use of the new confidence intervals when both sample sizes are smaller than 15. Tables of the confidence intervals are given in the Excel file in this link (https://technionmail-my.sharepoint.com/:f:/g/personal/ap_campus_technion_ac_il/El-213Kms51BhQxR8MmQJCYBDfIsvtrK9mQIey1sZnZWIQ?e=hxGunl).","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"9 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0