arXiv - STAT - Methodology最新文献_第2页

Statistical Inference for Chi-square Statistics or F-Statistics Based on Multiple Imputation 基于多重估算的卡方统计或 F 统计的统计推断

arXiv - STAT - Methodology Pub Date : 2024-09-17 DOI: arxiv-2409.10812

Binhuan Wang, Yixin Fang, Man Jin

引用次数: 0

Decomposing Gaussians with Unknown Covariance 对具有未知协方差的高斯进行分解

arXiv - STAT - Methodology Pub Date : 2024-09-17 DOI: arxiv-2409.11497

Ameer Dharamshi, Anna Neufeld, Lucy L. Gao, Jacob Bien, Daniela Witten

{"title":"Decomposing Gaussians with Unknown Covariance","authors":"Ameer Dharamshi, Anna Neufeld, Lucy L. Gao, Jacob Bien, Daniela Witten","doi":"arxiv-2409.11497","DOIUrl":"https://doi.org/arxiv-2409.11497","url":null,"abstract":"Common workflows in machine learning and statistics rely on the ability to\u0000partition the information in a data set into independent portions. Recent work\u0000has shown that this may be possible even when conventional sample splitting is\u0000not (e.g., when the number of samples $n=1$, or when observations are not\u0000independent and identically distributed). However, the approaches that are\u0000currently available to decompose multivariate Gaussian data require knowledge\u0000of the covariance matrix. In many important problems (such as in spatial or\u0000longitudinal data analysis, and graphical modeling), the covariance matrix may\u0000be unknown and even of primary interest. Thus, in this work we develop new\u0000approaches to decompose Gaussians with unknown covariance. First, we present a\u0000general algorithm that encompasses all previous decomposition approaches for\u0000Gaussian data as special cases, and can further handle the case of an unknown\u0000covariance. It yields a new and more flexible alternative to sample splitting\u0000when $n>1$. When $n=1$, we prove that it is impossible to partition the\u0000information in a multivariate Gaussian into independent portions without\u0000knowing the covariance matrix. Thus, we use the general algorithm to decompose\u0000a single multivariate Gaussian with unknown covariance into dependent parts\u0000with tractable conditional distributions, and demonstrate their use for\u0000inference and validation. The proposed decomposition strategy extends naturally\u0000to Gaussian processes. In simulation and on electroencephalography data, we\u0000apply these decompositions to the tasks of model selection and post-selection\u0000inference in settings where alternative strategies are unavailable.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interpretability Indices and Soft Constraints for Factor Models 因子模型的可解释性指数和软约束

arXiv - STAT - Methodology Pub Date : 2024-09-17 DOI: arxiv-2409.11525

Justin Philip Tuazon, Gia Mizrane Abubo, Joemari Olea

{"title":"Interpretability Indices and Soft Constraints for Factor Models","authors":"Justin Philip Tuazon, Gia Mizrane Abubo, Joemari Olea","doi":"arxiv-2409.11525","DOIUrl":"https://doi.org/arxiv-2409.11525","url":null,"abstract":"Factor analysis is a way to characterize the relationships between many\u0000(observable) variables in terms of a smaller number of unobservable random\u0000variables which are called factors. However, the application of factor models\u0000and its success can be subjective or difficult to gauge, since infinitely many\u0000factor models that produce the same correlation matrix can be fit given sample\u0000data. Thus, there is a need to operationalize a criterion that measures how\u0000meaningful or \"interpretable\" a factor model is in order to select the best\u0000among many factor models. While there are already techniques that aim to measure and enhance\u0000interpretability, new indices, as well as rotation methods via mathematical\u0000optimization based on them, are proposed to measure interpretability. The\u0000proposed methods directly incorporate semantics with the help of natural\u0000language processing and are generalized to incorporate any \"prior information\".\u0000Moreover, the indices allow for complete or partial specification of\u0000relationships at a pairwise level. Aside from these, two other main benefits of\u0000the proposed methods are that they do not require the estimation of factor\u0000scores, which avoids the factor score indeterminacy problem, and that no\u0000additional explanatory variables are necessary. The implementation of the proposed methods is written in Python 3 and is made\u0000available together with several helper functions through the package\u0000interpretablefa on the Python Package Index. The methods' application is\u0000demonstrated here using data on the Experiences in Close Relationships Scale,\u0000obtained from the Open-Source Psychometrics Project.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"104 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimation and imputation of missing data in longitudinal models with Zero-Inflated Poisson response variable 零膨胀泊松响应变量纵向模型中缺失数据的估计和估算

arXiv - STAT - Methodology Pub Date : 2024-09-17 DOI: arxiv-2409.11040

D. S. Martinez-Lobo, O. O. Melo, N. A. Cruz

引用次数: 0

Probability-scale residuals for event-time data 事件时间数据的概率尺度残差

arXiv - STAT - Methodology Pub Date : 2024-09-17 DOI: arxiv-2409.11385

Eric S. Kawaguchi, Bryan E. Shepherd, Chun Li

引用次数: 0

BMRMM: An R Package for Bayesian Markov (Renewal) Mixed Models BMRMM：贝叶斯马尔可夫（更新）混合模型 R 软件包

arXiv - STAT - Methodology Pub Date : 2024-09-17 DOI: arxiv-2409.10835

Yutong Wu, Abhra Sarkar

引用次数: 0

Performance of Cross-Validated Targeted Maximum Likelihood Estimation 交叉验证目标最大似然估计的性能

arXiv - STAT - Methodology Pub Date : 2024-09-17 DOI: arxiv-2409.11265

Matthew J. Smith, Rachael V. Phillips, Camille Maringe, Miguel Angel Luque Fernandez

{"title":"Performance of Cross-Validated Targeted Maximum Likelihood Estimation","authors":"Matthew J. Smith, Rachael V. Phillips, Camille Maringe, Miguel Angel Luque Fernandez","doi":"arxiv-2409.11265","DOIUrl":"https://doi.org/arxiv-2409.11265","url":null,"abstract":"Background: Advanced methods for causal inference, such as targeted maximum\u0000likelihood estimation (TMLE), require certain conditions for statistical\u0000inference. However, in situations where there is not differentiability due to\u0000data sparsity or near-positivity violations, the Donsker class condition is\u0000violated. In such situations, TMLE variance can suffer from inflation of the\u0000type I error and poor coverage, leading to conservative confidence intervals.\u0000Cross-validation of the TMLE algorithm (CVTMLE) has been suggested to improve\u0000on performance compared to TMLE in settings of positivity or Donsker class\u0000violations. We aim to investigate the performance of CVTMLE compared to TMLE in\u0000various settings. Methods: We utilised the data-generating mechanism as described in Leger et\u0000al. (2022) to run a Monte Carlo experiment under different Donsker class\u0000violations. Then, we evaluated the respective statistical performances of TMLE\u0000and CVTMLE with different super learner libraries, with and without regression\u0000tree methods. Results: We found that CVTMLE vastly improves confidence interval coverage\u0000without adversely affecting bias, particularly in settings with small sample\u0000sizes and near-positivity violations. Furthermore, incorporating regression\u0000trees using standard TMLE with ensemble super learner-based initial estimates\u0000increases bias and variance leading to invalid statistical inference. Conclusions: It has been shown that when using CVTMLE the Donsker class\u0000condition is no longer necessary to obtain valid statistical inference when\u0000using regression trees and under either data sparsity or near-positivity\u0000violations. We show through simulations that CVTMLE is much less sensitive to\u0000the choice of the super learner library and thereby provides better estimation\u0000and inference in cases where the super learner library uses more flexible\u0000candidates and is prone to overfitting.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Flexible survival regression with variable selection for heterogeneous population 针对异质性人群的灵活生存回归与变量选择

arXiv - STAT - Methodology Pub Date : 2024-09-16 DOI: arxiv-2409.10771

Abhishek Mandal, Abhisek Chakraborty

引用次数: 0

bayesCureRateModel: Bayesian Cure Rate Modeling for Time to Event Data in R bayesCureRateModel：用 R 对事件发生时间数据进行贝叶斯治愈率建模

arXiv - STAT - Methodology Pub Date : 2024-09-16 DOI: arxiv-2409.10221

Panagiotis Papastamoulis, Fotios Milienos

{"title":"bayesCureRateModel: Bayesian Cure Rate Modeling for Time to Event Data in R","authors":"Panagiotis Papastamoulis, Fotios Milienos","doi":"arxiv-2409.10221","DOIUrl":"https://doi.org/arxiv-2409.10221","url":null,"abstract":"The family of cure models provides a unique opportunity to simultaneously\u0000model both the proportion of cured subjects (those not facing the event of\u0000interest) and the distribution function of time-to-event for susceptibles\u0000(those facing the event). In practice, the application of cure models is mainly\u0000facilitated by the availability of various R packages. However, most of these\u0000packages primarily focus on the mixture or promotion time cure rate model. This\u0000article presents a fully Bayesian approach implemented in R to estimate a\u0000general family of cure rate models in the presence of covariates. It builds\u0000upon the work by Papastamoulis and Milienos (2024) by additionally considering\u0000various options for describing the promotion time, including the Weibull,\u0000exponential, Gompertz, log-logistic and finite mixtures of gamma distributions,\u0000among others. Moreover, the user can choose any proper distribution function\u0000for modeling the promotion time (provided that some specific conditions are\u0000met). Posterior inference is carried out by constructing a Metropolis-coupled\u0000Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the\u0000latent cure indicators and Metropolis-Hastings steps with Langevin diffusion\u0000dynamics for parameter updates. The main MCMC algorithm is embedded within a\u0000parallel tempering scheme by considering heated versions of the target\u0000posterior distribution. The package is illustrated on a real dataset analyzing\u0000the duration of the first marriage under the presence of various covariates\u0000such as the race, age and the presence of kids.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"183 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generalized Matrix Factor Model 广义矩阵因子模型

arXiv - STAT - Methodology Pub Date : 2024-09-16 DOI: arxiv-2409.10001

Xinbing Kong, Tong Zhang

{"title":"Generalized Matrix Factor Model","authors":"Xinbing Kong, Tong Zhang","doi":"arxiv-2409.10001","DOIUrl":"https://doi.org/arxiv-2409.10001","url":null,"abstract":"This article introduces a nonlinear generalized matrix factor model (GMFM)\u0000that allows for mixed-type variables, extending the scope of linear matrix\u0000factor models (LMFM) that are so far limited to handling continuous variables.\u0000We introduce a novel augmented Lagrange multiplier method, equivalent to the\u0000constraint maximum likelihood estimation, and carefully tailored to be locally\u0000concave around the true factor and loading parameters. This statistically\u0000guarantees the local convexity of the negative Hessian matrix around the true\u0000parameters of the factors and loadings, which is nontrivial in the matrix\u0000factor modeling and leads to feasible central limit theorems of the estimated\u0000factors and loadings. We also theoretically establish the convergence rates of\u0000the estimated factor and loading matrices for the GMFM under general conditions\u0000that allow for correlations across samples, rows, and columns. Moreover, we\u0000provide a model selection criterion to determine the numbers of row and column\u0000factors consistently. To numerically compute the constraint maximum likelihood\u0000estimator, we provide two algorithms: two-stage alternating maximization and\u0000minorization maximization. Extensive simulation studies demonstrate GMFM's\u0000superiority in handling discrete and mixed-type variables. An empirical data\u0000analysis of the company's operating performance shows that GMFM does clustering\u0000and reconstruction well in the presence of discontinuous entries in the data\u0000matrix.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0