{"title":"Dynamic Bayesian Networks with Conditional Dynamics in Edge Addition and Deletion","authors":"Lupe S. H. Chan, Amanda M. Y. Chu, Mike K. P. So","doi":"arxiv-2409.08965","DOIUrl":"https://doi.org/arxiv-2409.08965","url":null,"abstract":"This study presents a dynamic Bayesian network framework that facilitates\u0000intuitive gradual edge changes. We use two conditional dynamics to model the\u0000edge addition and deletion, and edge selection separately. Unlike previous\u0000research that uses a mixture network approach, which restricts the number of\u0000possible edge changes, or structural priors to induce gradual changes, which\u0000can lead to unclear network evolution, our model induces more frequent and\u0000intuitive edge change dynamics. We employ Markov chain Monte Carlo (MCMC)\u0000sampling to estimate the model structures and parameters and demonstrate the\u0000model's effectiveness in a portfolio selection application.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"203 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fused $L_{1/2}$ prior for large scale linear inverse problem with Gibbs bouncy particle sampler","authors":"Xiongwen Ke, Yanan Fan, Qingping Zhou","doi":"arxiv-2409.07874","DOIUrl":"https://doi.org/arxiv-2409.07874","url":null,"abstract":"In this paper, we study Bayesian approach for solving large scale linear\u0000inverse problems arising in various scientific and engineering fields. We\u0000propose a fused $L_{1/2}$ prior with edge-preserving and sparsity-promoting\u0000properties and show that it can be formulated as a Gaussian mixture Markov\u0000random field. Since the density function of this family of prior is neither\u0000log-concave nor Lipschitz, gradient-based Markov chain Monte Carlo methods can\u0000not be applied to sample the posterior. Thus, we present a Gibbs sampler in\u0000which all the conditional posteriors involved have closed form expressions. The\u0000Gibbs sampler works well for small size problems but it is computationally\u0000intractable for large scale problems due to the need for sample high\u0000dimensional Gaussian distribution. To reduce the computation burden, we\u0000construct a Gibbs bouncy particle sampler (Gibbs-BPS) based on a piecewise\u0000deterministic Markov process. This new sampler combines elements of Gibbs\u0000sampler with bouncy particle sampler and its computation complexity is an order\u0000of magnitude smaller. We show that the new sampler converges to the target\u0000distribution. With computed tomography examples, we demonstrate that the\u0000proposed method shows competitive performance with existing popular Bayesian\u0000methods and is highly efficient in large scale problems.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of Recent Advances in Gaussian Process Regression Methods","authors":"Chenyi Lyu, Xingchi Liu, Lyudmila Mihaylova","doi":"arxiv-2409.08112","DOIUrl":"https://doi.org/arxiv-2409.08112","url":null,"abstract":"Gaussian process (GP) methods have been widely studied recently, especially\u0000for large-scale systems with big data and even more extreme cases when data is\u0000sparse. Key advantages of these methods consist in: 1) the ability to provide\u0000inherent ways to assess the impact of uncertainties (especially in the data,\u0000and environment) on the solutions, 2) have efficient factorisation based\u0000implementations and 3) can be implemented easily in distributed manners and\u0000hence provide scalable solutions. This paper reviews the recently developed key\u0000factorised GP methods such as the hierarchical off-diagonal low-rank\u0000approximation methods and GP with Kronecker structures. An example illustrates\u0000the performance of these methods with respect to accuracy and computational\u0000complexity.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Community detection in multi-layer networks by regularized debiased spectral clustering","authors":"Huan Qing","doi":"arxiv-2409.07956","DOIUrl":"https://doi.org/arxiv-2409.07956","url":null,"abstract":"Community detection is a crucial problem in the analysis of multi-layer\u0000networks. In this work, we introduce a new method, called regularized debiased\u0000sum of squared adjacency matrices (RDSoS), to detect latent communities in\u0000multi-layer networks. RDSoS is developed based on a novel regularized Laplacian\u0000matrix that regularizes the debiased sum of squared adjacency matrices. In\u0000contrast, the classical regularized Laplacian matrix typically regularizes the\u0000adjacency matrix of a single-layer network. Therefore, at a high level, our\u0000regularized Laplacian matrix extends the classical regularized Laplacian matrix\u0000to multi-layer networks. We establish the consistency property of RDSoS under\u0000the multi-layer stochastic block model (MLSBM) and further extend RDSoS and its\u0000theoretical results to the degree-corrected version of the MLSBM model. The\u0000effectiveness of the proposed methods is evaluated and demonstrated through\u0000synthetic and real datasets.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple tests for restricted mean time lost with competing risks data","authors":"Merle Munko, Dennis Dobler, Marc Ditzhaus","doi":"arxiv-2409.07917","DOIUrl":"https://doi.org/arxiv-2409.07917","url":null,"abstract":"Easy-to-interpret effect estimands are highly desirable in survival analysis.\u0000In the competing risks framework, one good candidate is the restricted mean\u0000time lost (RMTL). It is defined as the area under the cumulative incidence\u0000function up to a prespecified time point and, thus, it summarizes the\u0000cumulative incidence function into a meaningful estimand. While existing\u0000RMTL-based tests are limited to two-sample comparisons and mostly to two event\u0000types, we aim to develop general contrast tests for factorial designs and an\u0000arbitrary number of event types based on a Wald-type test statistic.\u0000Furthermore, we avoid the often-made, rather restrictive continuity assumption\u0000on the event time distribution. This allows for ties in the data, which often\u0000occur in practical applications, e.g., when event times are measured in whole\u0000days. In addition, we develop more reliable tests for RMTL comparisons that are\u0000based on a permutation approach to improve the small sample performance. In a\u0000second step, multiple tests for RMTL comparisons are developed to test several\u0000null hypotheses simultaneously. Here, we incorporate the asymptotically exact\u0000dependence structure between the local test statistics to gain more power. The\u0000small sample performance of the proposed testing procedures is analyzed in\u0000simulations and finally illustrated by analyzing a real data example about\u0000leukemia patients who underwent bone marrow transplantation.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"398 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal inference and racial bias in policing: New estimands and the importance of mobility data","authors":"Zhuochao Huang, Brenden Beck, Joseph Antonelli","doi":"arxiv-2409.08059","DOIUrl":"https://doi.org/arxiv-2409.08059","url":null,"abstract":"Studying racial bias in policing is a critically important problem, but one\u0000that comes with a number of inherent difficulties due to the nature of the\u0000available data. In this manuscript we tackle multiple key issues in the causal\u0000analysis of racial bias in policing. First, we formalize race and place\u0000policing, the idea that individuals of one race are policed differently when\u0000they are in neighborhoods primarily made up of individuals of other races. We\u0000develop an estimand to study this question rigorously, show the assumptions\u0000necessary for causal identification, and develop sensitivity analyses to assess\u0000robustness to violations of key assumptions. Additionally, we investigate\u0000difficulties with existing estimands targeting racial bias in policing. We show\u0000for these estimands, and the estimands developed in this manuscript, that\u0000estimation can benefit from incorporating mobility data into analyses. We apply\u0000these ideas to a study in New York City, where we find a large amount of racial\u0000bias, as well as race and place policing, and that these findings are robust to\u0000large violations of untestable assumptions. We additionally show that mobility\u0000data can make substantial impacts on the resulting estimates, suggesting it\u0000should be used whenever possible in subsequent studies.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Randomized Spline Trees for Functional Data Classification: Theory and Application to Environmental Time Series","authors":"Donato Riccio, Fabrizio Maturo, Elvira Romano","doi":"arxiv-2409.07879","DOIUrl":"https://doi.org/arxiv-2409.07879","url":null,"abstract":"Functional data analysis (FDA) and ensemble learning can be powerful tools\u0000for analyzing complex environmental time series. Recent literature has\u0000highlighted the key role of diversity in enhancing accuracy and reducing\u0000variance in ensemble methods.This paper introduces Randomized Spline Trees\u0000(RST), a novel algorithm that bridges these two approaches by incorporating\u0000randomized functional representations into the Random Forest framework. RST\u0000generates diverse functional representations of input data using randomized\u0000B-spline parameters, creating an ensemble of decision trees trained on these\u0000varied representations. We provide a theoretical analysis of how this\u0000functional diversity contributes to reducing generalization error and present\u0000empirical evaluations on six environmental time series classification tasks\u0000from the UCR Time Series Archive. Results show that RST variants outperform\u0000standard Random Forests and Gradient Boosting on most datasets, improving\u0000classification accuracy by up to 14%. The success of RST demonstrates the\u0000potential of adaptive functional representations in capturing complex temporal\u0000patterns in environmental data. This work contributes to the growing field of\u0000machine learning techniques focused on functional data and opens new avenues\u0000for research in environmental time series analysis.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning for Two-Sample Testing under Right-Censored Data: A Simulation Study","authors":"Petr Philonenko, Sergey Postovalov","doi":"arxiv-2409.08201","DOIUrl":"https://doi.org/arxiv-2409.08201","url":null,"abstract":"The focus of this study is to evaluate the effectiveness of Machine Learning\u0000(ML) methods for two-sample testing with right-censored observations. To\u0000achieve this, we develop several ML-based methods with varying architectures\u0000and implement them as two-sample tests. Each method is an ensemble (stacking)\u0000that combines predictions from classical two-sample tests. This paper presents\u0000the results of training the proposed ML methods, examines their statistical\u0000power compared to classical two-sample tests, analyzes the distribution of test\u0000statistics for the proposed methods when the null hypothesis is true, and\u0000evaluates the significance of the features incorporated into the proposed\u0000methods. All results from numerical experiments were obtained from a synthetic\u0000dataset generated using the Smirnov transform (Inverse Transform Sampling) and\u0000replicated multiple times through Monte Carlo simulation. To test the\u0000two-sample problem with right-censored observations, one can use the proposed\u0000two-sample methods. All necessary materials (source code, example scripts,\u0000dataset, and samples) are available on GitHub and Hugging Face.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giorgia Zaccaria, Luis A. García-Escudero, Francesca Greselin, Agustín Mayo-Íscar
{"title":"Cellwise outlier detection in heterogeneous populations","authors":"Giorgia Zaccaria, Luis A. García-Escudero, Francesca Greselin, Agustín Mayo-Íscar","doi":"arxiv-2409.07881","DOIUrl":"https://doi.org/arxiv-2409.07881","url":null,"abstract":"Real-world applications may be affected by outlying values. In the\u0000model-based clustering literature, several methodologies have been proposed to\u0000detect units that deviate from the majority of the data (rowwise outliers) and\u0000trim them from the parameter estimates. However, the discarded observations can\u0000encompass valuable information in some observed features. Following the more\u0000recent cellwise contamination paradigm, we introduce a Gaussian mixture model\u0000for cellwise outlier detection. The proposal is estimated via an\u0000Expectation-Maximization (EM) algorithm with an additional step for flagging\u0000the contaminated cells of a data matrix and then imputing -- instead of\u0000discarding -- them before the parameter estimation. This procedure adheres to\u0000the spirit of the EM algorithm by treating the contaminated cells as missing\u0000values. We analyze the performance of the proposed model in comparison with\u0000other existing methodologies through a simulation study with different\u0000scenarios and illustrate its potential use for clustering, outlier detection,\u0000and imputation on three real data sets.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seong-ho Lee, Brian D. Richardson, Yanyuan Ma, Karen S. Marder, Tanya P. Garcia
{"title":"Robust and efficient estimation in the presence of a randomly censored covariate","authors":"Seong-ho Lee, Brian D. Richardson, Yanyuan Ma, Karen S. Marder, Tanya P. Garcia","doi":"arxiv-2409.07795","DOIUrl":"https://doi.org/arxiv-2409.07795","url":null,"abstract":"In Huntington's disease research, a current goal is to understand how\u0000symptoms change prior to a clinical diagnosis. Statistically, this entails\u0000modeling symptom severity as a function of the covariate 'time until\u0000diagnosis', which is often heavily right-censored in observational studies.\u0000Existing estimators that handle right-censored covariates have varying\u0000statistical efficiency and robustness to misspecified models for nuisance\u0000distributions (those of the censored covariate and censoring variable). On one\u0000extreme, complete case estimation, which utilizes uncensored data only, is free\u0000of nuisance distribution models but discards informative censored observations.\u0000On the other extreme, maximum likelihood estimation is maximally efficient but\u0000inconsistent when the covariate's distribution is misspecified. We propose a\u0000semiparametric estimator that is robust and efficient. When the nuisance\u0000distributions are modeled parametrically, the estimator is doubly robust, i.e.,\u0000consistent if at least one distribution is correctly specified, and\u0000semiparametric efficient if both models are correctly specified. When the\u0000nuisance distributions are estimated via nonparametric or machine learning\u0000methods, the estimator is consistent and semiparametric efficient. We show\u0000empirically that the proposed estimator, implemented in the R package sparcc,\u0000has its claimed properties, and we apply it to study Huntington's disease\u0000symptom trajectories using data from the Enroll-HD study.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}