{"title":"Approximate bayesian inference for geostatistical generalised linear models","authors":"E. Evangelou","doi":"10.3934/FODS.2019002","DOIUrl":"https://doi.org/10.3934/FODS.2019002","url":null,"abstract":"The aim of this paper is to bring together recent developments in Bayesian generalised linear mixed models and geostatistics. We focus on approximate methods on both areas. A technique known as full-scale approximation, proposed by Sang and Huang (2012) for improving the computational drawbacks of large geostatistical data, is incorporated into the INLA methodology, used for approximate Bayesian inference. We also discuss how INLA can be used for approximating the posterior distribution of transformations of parameters, useful for practical applications. Issues regarding the choice of the parameters of the approximation such as the knots and taper range are also addressed. Emphasis is given in applications in the context of disease mapping by illustrating the methodology for modelling the loa loa prevalence in Cameroon and malaria in the Gambia.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44770194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combinatorial Hodge theory for equitable kidney paired donation","authors":"Joshua L. Mike, V. Maroulas","doi":"10.3934/FODS.2019004","DOIUrl":"https://doi.org/10.3934/FODS.2019004","url":null,"abstract":"Kidney Paired Donation (KPD) is a system whereby incompatible patient-donor pairs (PD pairs) are entered into a pool to find compatible cyclic kidney exchanges where each pair gives and receives a kidney. The donation allocation decision problem for a KPD pool has traditionally been viewed within an economic theory and integer-programming framework. While previous allocation schema work well to donate the maximum number of kidneys at a specific time, certain subgroups of patients are rarely matched in such an exchange. Consequently, these methods lead to systematic inequity in the exchange, where many patients are rejected a kidney repeatedly. Our goal is to investigate inequity within the distribution of kidney allocation among patients, and to present an algorithm which minimizes allocation disparities. The method presented is inspired by cohomology and describes the cyclic structure in a kidney exchange efficiently; this structure is then used to search for an equitable kidney allocation. Another key result of our approach is a score function defined on PD pairs which measures cycle disparity within a KPD pool; i.e., this function measures the relative chance for each PD pair to take part in the kidney exchange if cycles are chosen uniformly. Specifically, we show that PD pairs with underdemanded donors or highly sensitized patients have lower scores than typical PD pairs. Furthermore, our results demonstrate that PD pair score and the chance to obtain a kidney are positively correlated when allocation is done by utility-optimal integer programming methods. In contrast, the chance to obtain a kidney through our method is independent of score, and thus unbiased in this regard.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44209556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Particle filters for inference of high-dimensional multivariate stochastic volatility models with cross-leverage effects","authors":"Yaxian Xu, A. Jasra","doi":"10.3934/fods.2019003","DOIUrl":"https://doi.org/10.3934/fods.2019003","url":null,"abstract":"Multivariate stochastic volatility models are a popular and well-known class of models in the analysis of financial time series because of their abilities to capture the important stylized facts of financial returns data. We consider the problems of filtering distribution estimation and also marginal likelihood calculation for multivariate stochastic volatility models with cross-leverage effects in the high dimensional case, that is when the number of financial time series that we analyze simultaneously (denoted by begin{document}$ d $end{document} ) is large. The standard particle filter has been widely used in the literature to solve these intractable inference problems. It has excellent performance in low to moderate dimensions, but collapses in the high dimensional case. In this article, two new and advanced particle filters proposed in [ 4 ], named the space-time particle filter and the marginal space-time particle filter, are explored for these estimation problems. The better performance in both the accuracy and stability for the two advanced particle filters are shown using simulation and empirical studies in comparison with the standard particle filter. In addition, Bayesian static model parameter estimation problem is considered with the advances in particle Markov chain Monte Carlo methods. The particle marginal Metropolis-Hastings algorithm is applied together with the likelihood estimates from the space-time particle filter to infer the static model parameter successfully when that using the likelihood estimates from the standard particle filter fails.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43334711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spectral methods to study the robustness of residual neural networks with infinite layers","authors":"T. Trimborn, Stephan Gerster, G. Visconti","doi":"10.3934/fods.2020012","DOIUrl":"https://doi.org/10.3934/fods.2020012","url":null,"abstract":"Recently, neural networks (NN) with an infinite number of layers have been introduced. Especially for these very large NN the training procedure is very expensive. Hence, there is interest to study their robustness with respect to input data to avoid unnecessarily retraining the network. Typically, model-based statistical inference methods, e.g. Bayesian neural networks, are used to quantify uncertainties. Here, we consider a special class of residual neural networks and we study the case, when the number of layers can be arbitrarily large. Then, kinetic theory allows to interpret the network as a dynamical system, described by a partial differential equation. We study the robustness of the mean-field neural network with respect to perturbations in initial data by applying UQ approaches on the loss functions.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70247997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Issues using logistic regression with class imbalance, with a case study from credit risk modelling","authors":"Yazhe Li, T. Bellotti, N. Adams","doi":"10.3934/fods.2019016","DOIUrl":"https://doi.org/10.3934/fods.2019016","url":null,"abstract":"The class imbalance problem arises in two-class classification problems, when the less frequent (minority) class is observed much less than the majority class. This characteristic is endemic in many problems such as modeling default or fraud detection. Recent work by Owen [ 19 ] has shown that, in a theoretical context related to infinite imbalance, logistic regression behaves in such a way that all data in the rare class can be replaced by their mean vector to achieve the same coefficient estimates. We build on Owen's results to show the phenomenon remains true for both weighted and penalized likelihood methods. Such results suggest that problems may occur if there is structure within the rare class that is not captured by the mean vector. We demonstrate this problem and suggest a relabelling solution based on clustering the minority class. In a simulation and a real mortgage dataset, we show that logistic regression is not able to provide the best out-of-sample predictive performance and that an approach that is able to model underlying structure in the minority class is often superior.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70247842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Background material","authors":"","doi":"10.1090/surv/236/02","DOIUrl":"https://doi.org/10.1090/surv/236/02","url":null,"abstract":"","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60690335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Markov chain simulation for multilevel Monte Carlo","authors":"A. Jasra, K. Law, Yaxian Xu","doi":"10.3934/FODS.2021004","DOIUrl":"https://doi.org/10.3934/FODS.2021004","url":null,"abstract":"This paper considers a new approach to using Markov chain Monte Carlo (MCMC) in contexts where one may adopt multilevel (ML) Monte Carlo. The underlying problem is to approximate expectations w.r.t. an underlying probability measure that is associated to a continuum problem, such as a continuous-time stochastic process. It is then assumed that the associated probability measure can only be used (e.g. sampled) under a discretized approximation. In such scenarios, it is known that to achieve a target error, the computational effort can be reduced when using MLMC relative to exact sampling from the most accurate discretized probability. The ideas rely upon introducing hierarchies of the discretizations where less accurate approximations cost less to compute, and using an appropriate collapsing sum expression for the target expectation. If a suitable coupling of the probability measures in the hierarchy is achieved, then a reduction in cost is possible. This article focused on the case where exact sampling from such coupling is not possible. We show that one can construct suitably coupled MCMC kernels when given only access to MCMC kernels which are invariant with respect to each discretized probability measure. We prove, under assumptions, that this coupled MCMC approach in a ML context can reduce the cost to achieve a given error, relative to exact sampling. Our approach is illustrated on a numerical example.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47508531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantum topological data analysis with continuous variables","authors":"G. Siopsis","doi":"10.3934/fods.2019017","DOIUrl":"https://doi.org/10.3934/fods.2019017","url":null,"abstract":"I introduce a continuous-variable quantum topological data algorithm. The goal of the quantum algorithm is to calculate the Betti numbers in persistent homology which are the dimensions of the kernel of the combinatorial Laplacian. I accomplish this task with the use of qRAM to create an oracle which organizes sets of data. I then perform a continuous-variable phase estimation on a Dirac operator to get a probability distribution with eigenvalue peaks. The results also leverage an implementation of continuous-variable conditional swap gate.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49129969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Cotter, D. Crisan, Darryl D. Holm, Wei Pan, I. Shevchenko
{"title":"Modelling uncertainty using stochastic transport noise in a 2-layer quasi-geostrophic model","authors":"C. Cotter, D. Crisan, Darryl D. Holm, Wei Pan, I. Shevchenko","doi":"10.3934/fods.2020010","DOIUrl":"https://doi.org/10.3934/fods.2020010","url":null,"abstract":"The stochastic variational approach for geophysical fluid dynamics was introduced by Holm (Proc Roy Soc A, 2015) as a framework for deriving stochastic parameterisations for unresolved scales. This paper applies the variational stochastic parameterisation in a two-layer quasi-geostrophic model for a begin{document}$ beta $end{document} -plane channel flow configuration. We present a new method for estimating the stochastic forcing (used in the parameterisation) to approximate unresolved components using data from the high resolution deterministic simulation, and describe a procedure for computing physically-consistent initial conditions for the stochastic model. We also quantify uncertainty of coarse grid simulations relative to the fine grid ones in homogeneous (teamed with small-scale vortices) and heterogeneous (featuring horizontally elongated large-scale jets) flows, and analyse how the spread of stochastic solutions depends on different parameters of the model. The parameterisation is tested by comparing it with the true eddy-resolving solution that has reached some statistical equilibrium and the deterministic solution modelled on a low-resolution grid. The results show that the proposed parameterisation significantly depends on the resolution of the stochastic model and gives good ensemble performance for both homogeneous and heterogeneous flows, and the parameterisation lays solid foundations for data assimilation.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43640188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Armin Eftekhari, M. Wakin, Ping Li, P. Constantine
{"title":"Randomized learning of the second-moment matrix of a smooth function","authors":"Armin Eftekhari, M. Wakin, Ping Li, P. Constantine","doi":"10.3934/fods.2019015","DOIUrl":"https://doi.org/10.3934/fods.2019015","url":null,"abstract":"Consider an open set $mathbb{D}subseteqmathbb{R}^n$, equipped with a probability measure $mu$. An important characteristic of a smooth function $f:mathbb{D}rightarrowmathbb{R}$ is its emph{second-moment matrix} $Sigma_{mu}:=int nabla f(x) nabla f(x)^* mu(dx) inmathbb{R}^{ntimes n}$, where $nabla f(x)inmathbb{R}^n$ is the gradient of $f(cdot)$ at $xinmathbb{D}$ and $*$ stands for transpose. For instance, the span of the leading $r$ eigenvectors of $Sigma_{mu}$ forms an emph{active subspace} of $f(cdot)$, which contains the directions along which $f(cdot)$ changes the most and is of particular interest in emph{ridge approximation}. In this work, we propose a simple algorithm for estimating $Sigma_{mu}$ from random point evaluations of $f(cdot)$ emph{without} imposing any structural assumptions on $Sigma_{mu}$. Theoretical guarantees for this algorithm are established with the aid of the same technical tools that have proved valuable in the context of covariance matrix estimation from partial measurements.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70247801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}