{"title":"Exact confidence intervals for functions of parameters in the k-sample multinomial problem","authors":"Michael C Sachs, Erin E Gabriel, Michael P Fay","doi":"arxiv-2406.19141","DOIUrl":"https://doi.org/arxiv-2406.19141","url":null,"abstract":"When the target of inference is a real-valued function of probability\u0000parameters in the k-sample multinomial problem, variance estimation may be\u0000challenging. In small samples, methods like the nonparametric bootstrap or\u0000delta method may perform poorly. We propose a novel general method in this\u0000setting for computing exact p-values and confidence intervals which means that\u0000type I error rates are correctly bounded and confidence intervals have at least\u0000nominal coverage at all sample sizes. Our method is applicable to any\u0000real-valued function of multinomial probabilities, accommodating an arbitrary\u0000number of samples with varying category counts. We describe the method and\u0000provide an implementation of it in R, with some computational optimization to\u0000ensure broad applicability. Simulations demonstrate our method's ability to\u0000maintain correct coverage rates in settings where the nonparametric bootstrap\u0000fails.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mathieu Fourment, Matthew Macaulay, Christiaan J Swanepoel, Xiang Ji, Marc A Suchard, Frederick A Matsen IV
{"title":"Torchtree: flexible phylogenetic model development and inference using PyTorch","authors":"Mathieu Fourment, Matthew Macaulay, Christiaan J Swanepoel, Xiang Ji, Marc A Suchard, Frederick A Matsen IV","doi":"arxiv-2406.18044","DOIUrl":"https://doi.org/arxiv-2406.18044","url":null,"abstract":"Bayesian inference has predominantly relied on the Markov chain Monte Carlo\u0000(MCMC) algorithm for many years. However, MCMC is computationally laborious,\u0000especially for complex phylogenetic models of time trees. This bottleneck has\u0000led to the search for alternatives, such as variational Bayes, which can scale\u0000better to large datasets. In this paper, we introduce torchtree, a framework\u0000written in Python that allows developers to easily implement rich phylogenetic\u0000models and algorithms using a fixed tree topology. One can either use automatic\u0000differentiation, or leverage torchtree's plug-in system to compute gradients\u0000analytically for model components for which automatic differentiation is slow.\u0000We demonstrate that the torchtree variational inference framework performs\u0000similarly to BEAST in terms of speed and approximation accuracy. Furthermore,\u0000we explore the use of the forward KL divergence as an optimizing criterion for\u0000variational inference, which can handle discontinuous and non-differentiable\u0000models. Our experiments show that inference using the forward KL divergence\u0000tends to be faster per iteration compared to the evidence lower bound (ELBO)\u0000criterion, although the ELBO-based inference may converge faster in some cases.\u0000Overall, torchtree provides a flexible and efficient framework for phylogenetic\u0000model development and inference using PyTorch.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Sampling of Truncated Multivariate Normals Using Sequential Nearest-Neighbor Approximation","authors":"Jian Cao, Matthias Katzfuss","doi":"arxiv-2406.17307","DOIUrl":"https://doi.org/arxiv-2406.17307","url":null,"abstract":"We propose a linear-complexity method for sampling from truncated\u0000multivariate normal (TMVN) distributions with high fidelity by applying\u0000nearest-neighbor approximations to a product-of-conditionals decomposition of\u0000the TMVN density. To make the sequential sampling based on the decomposition\u0000feasible, we introduce a novel method that avoids the intractable\u0000high-dimensional TMVN distribution by sampling sequentially from\u0000$m$-dimensional TMVN distributions, where $m$ is a tuning parameter controlling\u0000the fidelity. This allows us to overcome the existing methods' crucial problem\u0000of rapidly decreasing acceptance rates for increasing dimension. Throughout our\u0000experiments with up to tens of thousands of dimensions, we can produce\u0000high-fidelity samples with $m$ in the dozens, achieving superior scalability\u0000compared to existing state-of-the-art methods. We study a tetrachloroethylene\u0000concentration dataset that has $3{,}971$ observed responses and $20{,}730$\u0000undetected responses, together modeled as a partially censored Gaussian\u0000process, where our method enables posterior inference for the censored\u0000responses through sampling a $20{,}730$-dimensional TMVN distribution.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141523218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jere Koskela, Paul A. Jenkins, Adam M. Johansen, Dario Spano
{"title":"Genealogical processes of non-neutral population models under rapid mutation","authors":"Jere Koskela, Paul A. Jenkins, Adam M. Johansen, Dario Spano","doi":"arxiv-2406.16465","DOIUrl":"https://doi.org/arxiv-2406.16465","url":null,"abstract":"We show that genealogical trees arising from a broad class of non-neutral\u0000models of population evolution converge to the Kingman coalescent under a\u0000suitable rescaling of time. As well as non-neutral biological evolution, our\u0000results apply to genetic algorithms encompassing the prominent class of\u0000sequential Monte Carlo (SMC) methods. The time rescaling we need differs\u0000slightly from that used in classical results for convergence to the Kingman\u0000coalescent, which has implications for the performance of different resampling\u0000schemes in SMC algorithms. In addition, our work substantially simplifies\u0000earlier proofs of convergence to the Kingman coalescent, and corrects an error\u0000common to several earlier results.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141523192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recursive variational Gaussian approximation with the Whittle likelihood for linear non-Gaussian state space models","authors":"Bao Anh Vu, David Gunawan, Andrew Zammit-Mangion","doi":"arxiv-2406.15998","DOIUrl":"https://doi.org/arxiv-2406.15998","url":null,"abstract":"Parameter inference for linear and non-Gaussian state space models is\u0000challenging because the likelihood function contains an intractable integral\u0000over the latent state variables. Exact inference using Markov chain Monte Carlo\u0000is computationally expensive, particularly for long time series data.\u0000Variational Bayes methods are useful when exact inference is infeasible. These\u0000methods approximate the posterior density of the parameters by a simple and\u0000tractable distribution found through optimisation. In this paper, we propose a\u0000novel sequential variational Bayes approach that makes use of the Whittle\u0000likelihood for computationally efficient parameter inference in this class of\u0000state space models. Our algorithm, which we call Recursive Variational Gaussian\u0000Approximation with the Whittle Likelihood (R-VGA-Whittle), updates the\u0000variational parameters by processing data in the frequency domain. At each\u0000iteration, R-VGA-Whittle requires the gradient and Hessian of the Whittle\u0000log-likelihood, which are available in closed form for a wide class of models.\u0000Through several examples using a linear Gaussian state space model and a\u0000univariate/bivariate non-Gaussian stochastic volatility model, we show that\u0000R-VGA-Whittle provides good approximations to posterior distributions of the\u0000parameters and is very computationally efficient when compared to\u0000asymptotically exact methods such as Hamiltonian Monte Carlo.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Multivariate Initial Sequence Estimators for MCMC","authors":"Arka Banerjee, Dootika Vats","doi":"arxiv-2406.15874","DOIUrl":"https://doi.org/arxiv-2406.15874","url":null,"abstract":"Estimating Monte Carlo error is critical to valid simulation results in\u0000Markov chain Monte Carlo (MCMC) and initial sequence estimators were one of the\u0000first methods introduced for this. Over the last few years, focus has been on\u0000multivariate assessment of simulation error, and many multivariate\u0000generalizations of univariate methods have been developed. The multivariate\u0000initial sequence estimator is known to exhibit superior finite-sample\u0000performance compared to its competitors. However, the multivariate initial\u0000sequence estimator can be prohibitively slow, limiting its widespread use. We\u0000provide an efficient alternative to the multivariate initial sequence estimator\u0000that inherits both its asymptotic properties as well as the finite-sample\u0000superior performance. The effectiveness of the proposed estimator is shown via\u0000some MCMC example implementations. Further, we also present univariate and\u0000multivariate initial sequence estimators for when parallel MCMC chains are run\u0000and demonstrate their effectiveness over popular alternative.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141523219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khanh N. Dinh, Zijin Xiang, Zhihan Liu, Simon Tavaré
{"title":"Approximate Bayesian Computation sequential Monte Carlo via random forests","authors":"Khanh N. Dinh, Zijin Xiang, Zhihan Liu, Simon Tavaré","doi":"arxiv-2406.15865","DOIUrl":"https://doi.org/arxiv-2406.15865","url":null,"abstract":"Approximate Bayesian Computation (ABC) is a popular inference method when\u0000likelihoods are hard to come by. Practical bottlenecks of ABC applications\u0000include selecting statistics that summarize the data without losing too much\u0000information or introducing uncertainty, and choosing distance functions and\u0000tolerance thresholds that balance accuracy and computational efficiency. Recent\u0000studies have shown that ABC methods using random forest (RF) methodology\u0000perform well while circumventing many of ABC's drawbacks. However, RF\u0000construction is computationally expensive for large numbers of trees and model\u0000simulations, and there can be high uncertainty in the posterior if the prior\u0000distribution is uninformative. Here we adapt distributional random forests to\u0000the ABC setting, and introduce Approximate Bayesian Computation sequential\u0000Monte Carlo with random forests (ABC-SMC-(D)RF). This updates the prior\u0000distribution iteratively to focus on the most likely regions in the parameter\u0000space. We show that ABC-SMC-(D)RF can accurately infer posterior distributions\u0000for a wide range of deterministic and stochastic models in different scientific\u0000areas.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141523220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roben Delos Reyes, Hugo Lyons Keenan, Cameron Zachreson
{"title":"An agent-based model of behaviour change calibrated to reversal learning data","authors":"Roben Delos Reyes, Hugo Lyons Keenan, Cameron Zachreson","doi":"arxiv-2406.14062","DOIUrl":"https://doi.org/arxiv-2406.14062","url":null,"abstract":"Behaviour change lies at the heart of many observable collective phenomena\u0000such as the transmission and control of infectious diseases, adoption of public\u0000health policies, and migration of animals to new habitats. Representing the\u0000process of individual behaviour change in computer simulations of these\u0000phenomena remains an open challenge. Often, computational models use\u0000phenomenological implementations with limited support from behavioural data.\u0000Without a strong connection to observable quantities, such models have limited\u0000utility for simulating observed and counterfactual scenarios of emergent\u0000phenomena because they cannot be validated or calibrated. Here, we present a\u0000simple stochastic individual-based model of reversal learning that captures\u0000fundamental properties of individual behaviour change, namely, the capacity to\u0000learn based on accumulated reward signals, and the transient persistence of\u0000learned behaviour after rewards are removed or altered. The model has only two\u0000parameters, and we use approximate Bayesian computation to demonstrate that\u0000they are fully identifiable from empirical reversal learning time series data.\u0000Finally, we demonstrate how the model can be extended to account for the\u0000increased complexity of behavioural dynamics over longer time scales involving\u0000fluctuating stimuli. This work is a step towards the development and evaluation\u0000of fully identifiable individual-level behaviour change models that can\u0000function as validated submodels for complex simulations of collective behaviour\u0000change.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Duy Nguyen, Ca Hoang, Phat K. Huynh, Tien Truong, Dang Nguyen, Abhay Sharma, Trung Q. Le
{"title":"Multi-level Phenotypic Models of Cardiovascular Disease and Obstructive Sleep Apnea Comorbidities: A Longitudinal Wisconsin Sleep Cohort Study","authors":"Duy Nguyen, Ca Hoang, Phat K. Huynh, Tien Truong, Dang Nguyen, Abhay Sharma, Trung Q. Le","doi":"arxiv-2406.18602","DOIUrl":"https://doi.org/arxiv-2406.18602","url":null,"abstract":"Cardiovascular diseases (CVDs) are notably prevalent among patients with\u0000obstructive sleep apnea (OSA), posing unique challenges in predicting CVD\u0000progression due to the intricate interactions of comorbidities. Traditional\u0000models typically lack the necessary dynamic and longitudinal scope to\u0000accurately forecast CVD trajectories in OSA patients. This study introduces a\u0000novel multi-level phenotypic model to analyze the progression and interplay of\u0000these conditions over time, utilizing data from the Wisconsin Sleep Cohort,\u0000which includes 1,123 participants followed for decades. Our methodology\u0000comprises three advanced steps: (1) Conducting feature importance analysis\u0000through tree-based models to underscore critical predictive variables like\u0000total cholesterol, low-density lipoprotein (LDL), and diabetes. (2) Developing\u0000a logistic mixed-effects model (LGMM) to track longitudinal transitions and\u0000pinpoint significant factors, which displayed a diagnostic accuracy of 0.9556.\u0000(3) Implementing t-distributed Stochastic Neighbor Embedding (t-SNE) alongside\u0000Gaussian Mixture Models (GMM) to segment patient data into distinct phenotypic\u0000clusters that reflect varied risk profiles and disease progression pathways.\u0000This phenotypic clustering revealed two main groups, with one showing a\u0000markedly increased risk of major adverse cardiovascular events (MACEs),\u0000underscored by the significant predictive role of nocturnal hypoxia and\u0000sympathetic nervous system activity from sleep data. Analysis of transitions\u0000and trajectories with t-SNE and GMM highlighted different progression rates\u0000within the cohort, with one cluster progressing more slowly towards severe CVD\u0000states than the other. This study offers a comprehensive understanding of the\u0000dynamic relationship between CVD and OSA, providing valuable tools for\u0000predicting disease onset and tailoring treatment approaches.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallelizing MCMC with Machine Learning Classifier and Its Criterion Based on Kullback-Leibler Divergence","authors":"Tomoki Matsumoto, Yuichiro Kanazawa","doi":"arxiv-2406.11246","DOIUrl":"https://doi.org/arxiv-2406.11246","url":null,"abstract":"In the era of Big Data, analyzing high-dimensional and large datasets\u0000presents significant computational challenges. Although Bayesian statistics is\u0000well-suited for these complex data structures, Markov chain Monte Carlo (MCMC)\u0000method, which are essential for Bayesian estimation, suffers from computation\u0000cost because of its sequential nature. For faster and more effective\u0000computation, this paper introduces an algorithm to enhance a parallelizing MCMC\u0000method to handle this computation problem. We highlight the critical role of\u0000the overlapped area of posterior distributions after data partitioning, and\u0000propose a method using a machine learning classifier to effectively identify\u0000and extract MCMC draws from the area to approximate the actual posterior\u0000distribution. Our main contribution is the development of a Kullback-Leibler\u0000(KL) divergence-based criterion that simplifies hyperparameter tuning in\u0000training a classifier and makes the method nearly hyperparameter-free.\u0000Simulation studies validate the efficacy of our proposed methods.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"173 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}