{"title":"An efficient heuristic for approximate maximum flow computations","authors":"Jingyun Qian, Georg Hahn","doi":"arxiv-2409.08350","DOIUrl":"https://doi.org/arxiv-2409.08350","url":null,"abstract":"Several concepts borrowed from graph theory are routinely used to better\u0000understand the inner workings of the (human) brain. To this end, a connectivity\u0000network of the brain is built first, which then allows one to assess quantities\u0000such as information flow and information routing via shortest path and maximum\u0000flow computations. Since brain networks typically contain several thousand\u0000nodes and edges, computational scaling is a key research area. In this\u0000contribution, we focus on approximate maximum flow computations in large brain\u0000networks. By combining graph partitioning with maximum flow computations, we\u0000propose a new approximation algorithm for the computation of the maximum flow\u0000with runtime O(|V||E|^2/k^2) compared to the usual runtime of O(|V||E|^2) for\u0000the Edmonds-Karp algorithm, where $V$ is the set of vertices, $E$ is the set of\u0000edges, and $k$ is the number of partitions. We assess both accuracy and runtime\u0000of the proposed algorithm on simulated graphs as well as on graphs downloaded\u0000from the Brain Networks Data Repository (https://networkrepository.com).","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Debiased high-dimensional regression calibration for errors-in-variables log-contrast models","authors":"Huali Zhao, Tianying Wang","doi":"arxiv-2409.07568","DOIUrl":"https://doi.org/arxiv-2409.07568","url":null,"abstract":"Motivated by the challenges in analyzing gut microbiome and metagenomic data,\u0000this work aims to tackle the issue of measurement errors in high-dimensional\u0000regression models that involve compositional covariates. This paper marks a\u0000pioneering effort in conducting statistical inference on high-dimensional\u0000compositional data affected by mismeasured or contaminated data. We introduce a\u0000calibration approach tailored for the linear log-contrast model. Under\u0000relatively lenient conditions regarding the sparsity level of the parameter, we\u0000have established the asymptotic normality of the estimator for inference.\u0000Numerical experiments and an application in microbiome study have demonstrated\u0000the efficacy of our high-dimensional calibration strategy in minimizing bias\u0000and achieving the expected coverage rates for confidence intervals. Moreover,\u0000the potential application of our proposed methodology extends well beyond\u0000compositional data, suggesting its adaptability for a wide range of research\u0000contexts.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Order selection in GARMA models for count time series: a Bayesian perspective","authors":"Katerine Zuniga Lastra, Guilherme Pumi, Taiane Schaedler Prass","doi":"arxiv-2409.07263","DOIUrl":"https://doi.org/arxiv-2409.07263","url":null,"abstract":"Estimation in GARMA models has traditionally been carried out under the\u0000frequentist approach. To date, Bayesian approaches for such estimation have\u0000been relatively limited. In the context of GARMA models for count time series,\u0000Bayesian estimation achieves satisfactory results in terms of point estimation.\u0000Model selection in this context often relies on the use of information\u0000criteria. Despite its prominence in the literature, the use of information\u0000criteria for model selection in GARMA models for count time series have been\u0000shown to present poor performance in simulations, especially in terms of their\u0000ability to correctly identify models, even under large sample sizes. In this\u0000study, we study the problem of order selection in GARMA models for count time\u0000series, adopting a Bayesian perspective through the application of the\u0000Reversible Jump Markov Chain Monte Carlo approach. Monte Carlo simulation\u0000studies are conducted to assess the finite sample performance of the developed\u0000ideas, including point and interval inference, sensitivity analysis, effects of\u0000burn-in and thinning, as well as the choice of related priors and\u0000hyperparameters. Two real-data applications are presented, one considering\u0000automobile production in Brazil and the other considering bus exportation in\u0000Brazil before and after the COVID-19 pandemic, showcasing the method's\u0000capabilities and further exploring its flexibility.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial Deep Convolutional Neural Networks","authors":"Qi Wang, Paul A. Parker, Robert B. Lund","doi":"arxiv-2409.07559","DOIUrl":"https://doi.org/arxiv-2409.07559","url":null,"abstract":"Spatial prediction problems often use Gaussian process models, which can be\u0000computationally burdensome in high dimensions. Specification of an appropriate\u0000covariance function for the model can be challenging when complex\u0000non-stationarities exist. Recent work has shown that pre-computed spatial basis\u0000functions and a feed-forward neural network can capture complex spatial\u0000dependence structures while remaining computationally efficient. This paper\u0000builds on this literature by tailoring spatial basis functions for use in\u0000convolutional neural networks. Through both simulated and real data, we\u0000demonstrate that this approach yields more accurate spatial predictions than\u0000existing methods. Uncertainty quantification is also considered.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local Sequential MCMC for Data Assimilation with Applications in Geoscience","authors":"Hamza Ruzayqat, Omar Knio","doi":"arxiv-2409.07111","DOIUrl":"https://doi.org/arxiv-2409.07111","url":null,"abstract":"This paper presents a new data assimilation (DA) scheme based on a sequential\u0000Markov Chain Monte Carlo (SMCMC) DA technique [Ruzayqat et al. 2024] which is\u0000provably convergent and has been recently used for filtering, particularly for\u0000high-dimensional non-linear, and potentially, non-Gaussian state-space models.\u0000Unlike particle filters, which can be considered exact methods and can be used\u0000for filtering non-linear, non-Gaussian models, SMCMC does not assign weights to\u0000the samples/particles, and therefore, the method does not suffer from the issue\u0000of weight-degeneracy when a relatively small number of samples is used. We\u0000design a localization approach within the SMCMC framework that focuses on\u0000regions where observations are located and restricts the transition densities\u0000included in the filtering distribution of the state to these regions. This\u0000results in immensely reducing the effective degrees of freedom and thus\u0000improving the efficiency. We test the new technique on high-dimensional ($d\u0000sim 10^4 - 10^5$) linear Gaussian model and non-linear shallow water models\u0000with Gaussian noise with real and synthetic observations. For two of the\u0000numerical examples, the observations mimic the data generated by the Surface\u0000Water and Ocean Topography (SWOT) mission led by NASA, which is a swath of\u0000ocean height observations that changes location at every assimilation time\u0000step. We also use a set of ocean drifters' real observations in which the\u0000drifters are moving according the ocean kinematics and assumed to have\u0000uncertain locations at the time of assimilation. We show that when higher\u0000accuracy is required, the proposed algorithm is superior in terms of efficiency\u0000and accuracy over competing ensemble methods and the original SMCMC filter.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustered Factor Analysis for Multivariate Spatial Data","authors":"Yanxiu Jin, Tomoya Wakayama, Renhe Jiang, Shonosuke Sugasawa","doi":"arxiv-2409.07018","DOIUrl":"https://doi.org/arxiv-2409.07018","url":null,"abstract":"Factor analysis has been extensively used to reveal the dependence structures\u0000among multivariate variables, offering valuable insight in various fields.\u0000However, it cannot incorporate the spatial heterogeneity that is typically\u0000present in spatial data. To address this issue, we introduce an effective\u0000method specifically designed to discover the potential dependence structures in\u0000multivariate spatial data. Our approach assumes that spatial locations can be\u0000approximately divided into a finite number of clusters, with locations within\u0000the same cluster sharing similar dependence structures. By leveraging an\u0000iterative algorithm that combines spatial clustering with factor analysis, we\u0000simultaneously detect spatial clusters and estimate a unique factor model for\u0000each cluster. The proposed method is evaluated through comprehensive simulation\u0000studies, demonstrating its flexibility. In addition, we apply the proposed\u0000method to a dataset of railway station attributes in the Tokyo metropolitan\u0000area, highlighting its practical applicability and effectiveness in uncovering\u0000complex spatial dependencies.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-parametric estimation of transition intensities in interval censored Markov multi-state models without loops","authors":"Daniel Gomon, Hein Putter","doi":"arxiv-2409.07176","DOIUrl":"https://doi.org/arxiv-2409.07176","url":null,"abstract":"Panel data arises when transitions between different states are\u0000interval-censored in multi-state data. The analysis of such data using\u0000non-parametric multi-state models was not possible until recently, but is very\u0000desirable as it allows for more flexibility than its parametric counterparts.\u0000The single available result to date has some unique drawbacks. We propose a\u0000non-parametric estimator of the transition intensities for panel data using an\u0000Expectation Maximisation algorithm. The method allows for a mix of\u0000interval-censored and right-censored (exactly observed) transitions. A\u0000condition to check for the convergence of the algorithm to the non-parametric\u0000maximum likelihood estimator is given. A simulation study comparing the\u0000proposed estimator to a consistent estimator is performed, and shown to yield\u0000near identical estimates at smaller computational cost. A data set on the\u0000emergence of teeth in children is analysed. Code to perform the analyses is\u0000publicly available.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Bayesian Networks, Elicitation and Data Embedding for Secure Environments","authors":"Kieran Drury, Jim Q. Smith","doi":"arxiv-2409.07389","DOIUrl":"https://doi.org/arxiv-2409.07389","url":null,"abstract":"Serious crime modelling typically needs to be undertaken securely behind a\u0000firewall where police knowledge and capabilities can remain undisclosed. Data\u0000informing an ongoing incident is often sparse, with a large proportion of\u0000relevant data only coming to light after the incident culminates or after\u0000police intervene - by which point it is too late to make use of the data to aid\u0000real-time decision making for the incident in question. Much of the data that\u0000is available to police to support real-time decision making is highly\u0000confidential so cannot be shared with academics, and is therefore missing to\u0000them. In this paper, we describe the development of a formal protocol where a\u0000graphical model is used as a framework for securely translating a model\u0000designed by an academic team to a model for use by a police team. We then show,\u0000for the first time, how libraries of these models can be built and used for\u0000real-time decision support to circumvent the challenges of data missingness and\u0000tardiness seen in such a secure environment. The parallel development described\u0000by this protocol ensures that any sensitive information collected by police,\u0000and missing to academics, remains secured behind a firewall. The protocol\u0000nevertheless guides police so that they are able to combine the typically\u0000incomplete data streams that are open source with their more sensitive\u0000information in a formal and justifiable way. We illustrate the application of\u0000this protocol by describing how a new entry - a suspected vehicle attack - can\u0000be embedded into such a police library of criminal plots.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local Effects of Continuous Instruments without Positivity","authors":"Prabrisha Rakshit, Alexander Levis, Luke Keele","doi":"arxiv-2409.07350","DOIUrl":"https://doi.org/arxiv-2409.07350","url":null,"abstract":"Instrumental variables have become a popular study design for the estimation\u0000of treatment effects in the presence of unobserved confounders. In the\u0000canonical instrumental variables design, the instrument is a binary variable,\u0000and most extant methods are tailored to this context. In many settings,\u0000however, the instrument is a continuous measure. Standard estimation methods\u0000can be applied with continuous instruments, but they require strong assumptions\u0000regarding functional form. Moreover, while some recent work has introduced more\u0000flexible approaches for continuous instruments, these methods require an\u0000assumption known as positivity that is unlikely to hold in many applications.\u0000We derive a novel family of causal estimands using a stochastic dynamic\u0000intervention framework that considers a range of intervention distributions\u0000that are absolutely continuous with respect to the observed distribution of the\u0000instrument. These estimands focus on a specific form of local effect but do not\u0000require a positivity assumption. Next, we develop doubly robust estimators for\u0000these estimands that allow for estimation of the nuisance functions via\u0000nonparametric estimators. We use empirical process theory and sample splitting\u0000to derive asymptotic properties of the proposed estimators under weak\u0000conditions. In addition, we derive methods for profiling the principal strata\u0000as well as a method for sensitivity analysis for assessing robustness to an\u0000underlying monotonicity assumption. We evaluate our methods via simulation and\u0000demonstrate their feasibility using an application on the effectiveness of\u0000surgery for specific emergency conditions.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Determining number of factors under stability considerations","authors":"Sze Ming Lee, Yunxiao Chen","doi":"arxiv-2409.07617","DOIUrl":"https://doi.org/arxiv-2409.07617","url":null,"abstract":"This paper proposes a novel method for determining the number of factors in\u0000linear factor models under stability considerations. An instability measure is\u0000proposed based on the principal angle between the estimated loading spaces\u0000obtained by data splitting. Based on this measure, criteria for determining the\u0000number of factors are proposed and shown to be consistent. This consistency is\u0000obtained using results from random matrix theory, especially the complete\u0000delocalization of non-outlier eigenvectors. The advantage of the proposed\u0000methods over the existing ones is shown via weaker asymptotic requirements for\u0000consistency, simulation studies and a real data example.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}