arXiv - STAT - Methodology最新文献_第5页

Dynamic Bayesian Networks with Conditional Dynamics in Edge Addition and Deletion 边缘增删条件动态贝叶斯网络

arXiv - STAT - Methodology Pub Date : 2024-09-13 DOI: arxiv-2409.08965

Lupe S. H. Chan, Amanda M. Y. Chu, Mike K. P. So

引用次数: 0

Fused $L_{1/2}$ prior for large scale linear inverse problem with Gibbs bouncy particle sampler 用吉布斯弹跳粒子采样器解决大规模线性逆问题的融合 $L_{1/2}$ 先验

arXiv - STAT - Methodology Pub Date : 2024-09-12 DOI: arxiv-2409.07874

Xiongwen Ke, Yanan Fan, Qingping Zhou

{"title":"Fused $L_{1/2}$ prior for large scale linear inverse problem with Gibbs bouncy particle sampler","authors":"Xiongwen Ke, Yanan Fan, Qingping Zhou","doi":"arxiv-2409.07874","DOIUrl":"https://doi.org/arxiv-2409.07874","url":null,"abstract":"In this paper, we study Bayesian approach for solving large scale linear\u0000inverse problems arising in various scientific and engineering fields. We\u0000propose a fused $L_{1/2}$ prior with edge-preserving and sparsity-promoting\u0000properties and show that it can be formulated as a Gaussian mixture Markov\u0000random field. Since the density function of this family of prior is neither\u0000log-concave nor Lipschitz, gradient-based Markov chain Monte Carlo methods can\u0000not be applied to sample the posterior. Thus, we present a Gibbs sampler in\u0000which all the conditional posteriors involved have closed form expressions. The\u0000Gibbs sampler works well for small size problems but it is computationally\u0000intractable for large scale problems due to the need for sample high\u0000dimensional Gaussian distribution. To reduce the computation burden, we\u0000construct a Gibbs bouncy particle sampler (Gibbs-BPS) based on a piecewise\u0000deterministic Markov process. This new sampler combines elements of Gibbs\u0000sampler with bouncy particle sampler and its computation complexity is an order\u0000of magnitude smaller. We show that the new sampler converges to the target\u0000distribution. With computed tomography examples, we demonstrate that the\u0000proposed method shows competitive performance with existing popular Bayesian\u0000methods and is highly efficient in large scale problems.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Review of Recent Advances in Gaussian Process Regression Methods 高斯过程回归方法最新进展综述

arXiv - STAT - Methodology Pub Date : 2024-09-12 DOI: arxiv-2409.08112

Chenyi Lyu, Xingchi Liu, Lyudmila Mihaylova

引用次数: 0

Community detection in multi-layer networks by regularized debiased spectral clustering 通过正则化去偏谱聚类检测多层网络中的群落

arXiv - STAT - Methodology Pub Date : 2024-09-12 DOI: arxiv-2409.07956

Huan Qing

引用次数: 0

Multiple tests for restricted mean time lost with competing risks data 利用竞争风险数据对受限平均损失时间进行多重测试

arXiv - STAT - Methodology Pub Date : 2024-09-12 DOI: arxiv-2409.07917

Merle Munko, Dennis Dobler, Marc Ditzhaus

{"title":"Multiple tests for restricted mean time lost with competing risks data","authors":"Merle Munko, Dennis Dobler, Marc Ditzhaus","doi":"arxiv-2409.07917","DOIUrl":"https://doi.org/arxiv-2409.07917","url":null,"abstract":"Easy-to-interpret effect estimands are highly desirable in survival analysis.\u0000In the competing risks framework, one good candidate is the restricted mean\u0000time lost (RMTL). It is defined as the area under the cumulative incidence\u0000function up to a prespecified time point and, thus, it summarizes the\u0000cumulative incidence function into a meaningful estimand. While existing\u0000RMTL-based tests are limited to two-sample comparisons and mostly to two event\u0000types, we aim to develop general contrast tests for factorial designs and an\u0000arbitrary number of event types based on a Wald-type test statistic.\u0000Furthermore, we avoid the often-made, rather restrictive continuity assumption\u0000on the event time distribution. This allows for ties in the data, which often\u0000occur in practical applications, e.g., when event times are measured in whole\u0000days. In addition, we develop more reliable tests for RMTL comparisons that are\u0000based on a permutation approach to improve the small sample performance. In a\u0000second step, multiple tests for RMTL comparisons are developed to test several\u0000null hypotheses simultaneously. Here, we incorporate the asymptotically exact\u0000dependence structure between the local test statistics to gain more power. The\u0000small sample performance of the proposed testing procedures is analyzed in\u0000simulations and finally illustrated by analyzing a real data example about\u0000leukemia patients who underwent bone marrow transplantation.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"398 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Causal inference and racial bias in policing: New estimands and the importance of mobility data 警务中的因果推断和种族偏见：新的估算值和流动性数据的重要性

arXiv - STAT - Methodology Pub Date : 2024-09-12 DOI: arxiv-2409.08059

Zhuochao Huang, Brenden Beck, Joseph Antonelli

{"title":"Causal inference and racial bias in policing: New estimands and the importance of mobility data","authors":"Zhuochao Huang, Brenden Beck, Joseph Antonelli","doi":"arxiv-2409.08059","DOIUrl":"https://doi.org/arxiv-2409.08059","url":null,"abstract":"Studying racial bias in policing is a critically important problem, but one\u0000that comes with a number of inherent difficulties due to the nature of the\u0000available data. In this manuscript we tackle multiple key issues in the causal\u0000analysis of racial bias in policing. First, we formalize race and place\u0000policing, the idea that individuals of one race are policed differently when\u0000they are in neighborhoods primarily made up of individuals of other races. We\u0000develop an estimand to study this question rigorously, show the assumptions\u0000necessary for causal identification, and develop sensitivity analyses to assess\u0000robustness to violations of key assumptions. Additionally, we investigate\u0000difficulties with existing estimands targeting racial bias in policing. We show\u0000for these estimands, and the estimands developed in this manuscript, that\u0000estimation can benefit from incorporating mobility data into analyses. We apply\u0000these ideas to a study in New York City, where we find a large amount of racial\u0000bias, as well as race and place policing, and that these findings are robust to\u0000large violations of untestable assumptions. We additionally show that mobility\u0000data can make substantial impacts on the resulting estimates, suggesting it\u0000should be used whenever possible in subsequent studies.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Randomized Spline Trees for Functional Data Classification: Theory and Application to Environmental Time Series 用于功能数据分类的随机样条树：环境时间序列的理论与应用

arXiv - STAT - Methodology Pub Date : 2024-09-12 DOI: arxiv-2409.07879

Donato Riccio, Fabrizio Maturo, Elvira Romano

{"title":"Randomized Spline Trees for Functional Data Classification: Theory and Application to Environmental Time Series","authors":"Donato Riccio, Fabrizio Maturo, Elvira Romano","doi":"arxiv-2409.07879","DOIUrl":"https://doi.org/arxiv-2409.07879","url":null,"abstract":"Functional data analysis (FDA) and ensemble learning can be powerful tools\u0000for analyzing complex environmental time series. Recent literature has\u0000highlighted the key role of diversity in enhancing accuracy and reducing\u0000variance in ensemble methods.This paper introduces Randomized Spline Trees\u0000(RST), a novel algorithm that bridges these two approaches by incorporating\u0000randomized functional representations into the Random Forest framework. RST\u0000generates diverse functional representations of input data using randomized\u0000B-spline parameters, creating an ensemble of decision trees trained on these\u0000varied representations. We provide a theoretical analysis of how this\u0000functional diversity contributes to reducing generalization error and present\u0000empirical evaluations on six environmental time series classification tasks\u0000from the UCR Time Series Archive. Results show that RST variants outperform\u0000standard Random Forests and Gradient Boosting on most datasets, improving\u0000classification accuracy by up to 14%. The success of RST demonstrates the\u0000potential of adaptive functional representations in capturing complex temporal\u0000patterns in environmental data. This work contributes to the growing field of\u0000machine learning techniques focused on functional data and opens new avenues\u0000for research in environmental time series analysis.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine Learning for Two-Sample Testing under Right-Censored Data: A Simulation Study 右删失数据下用于双样本测试的机器学习：模拟研究

arXiv - STAT - Methodology Pub Date : 2024-09-12 DOI: arxiv-2409.08201

Petr Philonenko, Sergey Postovalov

{"title":"Machine Learning for Two-Sample Testing under Right-Censored Data: A Simulation Study","authors":"Petr Philonenko, Sergey Postovalov","doi":"arxiv-2409.08201","DOIUrl":"https://doi.org/arxiv-2409.08201","url":null,"abstract":"The focus of this study is to evaluate the effectiveness of Machine Learning\u0000(ML) methods for two-sample testing with right-censored observations. To\u0000achieve this, we develop several ML-based methods with varying architectures\u0000and implement them as two-sample tests. Each method is an ensemble (stacking)\u0000that combines predictions from classical two-sample tests. This paper presents\u0000the results of training the proposed ML methods, examines their statistical\u0000power compared to classical two-sample tests, analyzes the distribution of test\u0000statistics for the proposed methods when the null hypothesis is true, and\u0000evaluates the significance of the features incorporated into the proposed\u0000methods. All results from numerical experiments were obtained from a synthetic\u0000dataset generated using the Smirnov transform (Inverse Transform Sampling) and\u0000replicated multiple times through Monte Carlo simulation. To test the\u0000two-sample problem with right-censored observations, one can use the proposed\u0000two-sample methods. All necessary materials (source code, example scripts,\u0000dataset, and samples) are available on GitHub and Hugging Face.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cellwise outlier detection in heterogeneous populations 异质群体中的细胞离群点检测

arXiv - STAT - Methodology Pub Date : 2024-09-12 DOI: arxiv-2409.07881

Giorgia Zaccaria, Luis A. García-Escudero, Francesca Greselin, Agustín Mayo-Íscar

{"title":"Cellwise outlier detection in heterogeneous populations","authors":"Giorgia Zaccaria, Luis A. García-Escudero, Francesca Greselin, Agustín Mayo-Íscar","doi":"arxiv-2409.07881","DOIUrl":"https://doi.org/arxiv-2409.07881","url":null,"abstract":"Real-world applications may be affected by outlying values. In the\u0000model-based clustering literature, several methodologies have been proposed to\u0000detect units that deviate from the majority of the data (rowwise outliers) and\u0000trim them from the parameter estimates. However, the discarded observations can\u0000encompass valuable information in some observed features. Following the more\u0000recent cellwise contamination paradigm, we introduce a Gaussian mixture model\u0000for cellwise outlier detection. The proposal is estimated via an\u0000Expectation-Maximization (EM) algorithm with an additional step for flagging\u0000the contaminated cells of a data matrix and then imputing -- instead of\u0000discarding -- them before the parameter estimation. This procedure adheres to\u0000the spirit of the EM algorithm by treating the contaminated cells as missing\u0000values. We analyze the performance of the proposed model in comparison with\u0000other existing methodologies through a simulation study with different\u0000scenarios and illustrate its potential use for clustering, outlier detection,\u0000and imputation on three real data sets.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust and efficient estimation in the presence of a randomly censored covariate 存在随机删减协变量时的稳健高效估计

arXiv - STAT - Methodology Pub Date : 2024-09-12 DOI: arxiv-2409.07795

Seong-ho Lee, Brian D. Richardson, Yanyuan Ma, Karen S. Marder, Tanya P. Garcia

{"title":"Robust and efficient estimation in the presence of a randomly censored covariate","authors":"Seong-ho Lee, Brian D. Richardson, Yanyuan Ma, Karen S. Marder, Tanya P. Garcia","doi":"arxiv-2409.07795","DOIUrl":"https://doi.org/arxiv-2409.07795","url":null,"abstract":"In Huntington's disease research, a current goal is to understand how\u0000symptoms change prior to a clinical diagnosis. Statistically, this entails\u0000modeling symptom severity as a function of the covariate 'time until\u0000diagnosis', which is often heavily right-censored in observational studies.\u0000Existing estimators that handle right-censored covariates have varying\u0000statistical efficiency and robustness to misspecified models for nuisance\u0000distributions (those of the censored covariate and censoring variable). On one\u0000extreme, complete case estimation, which utilizes uncensored data only, is free\u0000of nuisance distribution models but discards informative censored observations.\u0000On the other extreme, maximum likelihood estimation is maximally efficient but\u0000inconsistent when the covariate's distribution is misspecified. We propose a\u0000semiparametric estimator that is robust and efficient. When the nuisance\u0000distributions are modeled parametrically, the estimator is doubly robust, i.e.,\u0000consistent if at least one distribution is correctly specified, and\u0000semiparametric efficient if both models are correctly specified. When the\u0000nuisance distributions are estimated via nonparametric or machine learning\u0000methods, the estimator is consistent and semiparametric efficient. We show\u0000empirically that the proposed estimator, implemented in the R package sparcc,\u0000has its claimed properties, and we apply it to study Huntington's disease\u0000symptom trajectories using data from the Enroll-HD study.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0