Samuel Stockman, Daniel J. Lawson, Maximilian J. Werner
{"title":"SB-ETAS: using simulation based inference for scalable, likelihood-free inference for the ETAS model of earthquake occurrences","authors":"Samuel Stockman, Daniel J. Lawson, Maximilian J. Werner","doi":"10.1007/s11222-024-10486-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10486-6","url":null,"abstract":"<p>The rapid growth of earthquake catalogs, driven by machine learning-based phase picking and denser seismic networks, calls for the application of a broader range of models to determine whether the new data enhances earthquake forecasting capabilities. Additionally, this growth demands that existing forecasting models efficiently scale to handle the increased data volume. Approximate inference methods such as <span>inlabru</span>, which is based on the Integrated nested Laplace approximation, offer improved computational efficiencies and the ability to perform inference on more complex point-process models compared to traditional MCMC approaches. We present SB-ETAS: a simulation based inference procedure for the epidemic-type aftershock sequence (ETAS) model. This approximate Bayesian method uses sequential neural posterior estimation (SNPE) to learn posterior distributions from simulations, rather than typical MCMC sampling using the likelihood. On synthetic earthquake catalogs, SB-ETAS provides better coverage of ETAS posterior distributions compared with <span>inlabru</span>. Furthermore, we demonstrate that using a simulation based procedure for inference improves the scalability from <span>(mathcal {O}(n^2))</span> to <span>(mathcal {O}(nlog n))</span>. This makes it feasible to fit to very large earthquake catalogs, such as one for Southern California dating back to 1981. SB-ETAS can find Bayesian estimates of ETAS parameters for this catalog in less than 10 h on a standard laptop, a task that would have taken over 2 weeks using MCMC. Beyond the standard ETAS model, this simulation based framework allows earthquake modellers to define and infer parameters for much more complex models by removing the need to define a likelihood function.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"22 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse Bayesian learning using TMB (Template Model Builder)","authors":"Ingvild M. Helgøy, Hans J. Skaug, Yushu Li","doi":"10.1007/s11222-024-10476-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10476-8","url":null,"abstract":"<p>Sparse Bayesian Learning, and more specifically the Relevance Vector Machine (RVM), can be used in supervised learning for both classification and regression problems. Such methods are particularly useful when applied to big data in order to find a sparse (in weight space) representation of the model. This paper demonstrates that the Template Model Builder (TMB) is an accurate and flexible computational framework for implementation of sparse Bayesian learning methods.The user of TMB is only required to specify the joint likelihood of the weights and the data, while the Laplace approximation of the marginal likelihood is automatically evaluated to numerical precision. This approximation is in turn used to estimate hyperparameters by maximum marginal likelihood. In order to reduce the computational cost of the Laplace approximation we introduce the notion of an “active set” of weights, and we devise an algorithm for dynamically updating this set until convergence, similar to what is done in other RVM type methods. We implement two different methods using TMB; the RVM and the Probabilistic Feature Selection and Classification Vector Machine method, where the latter also performs feature selection. Experiments based on benchmark data show that our TMB implementation performs comparable to that of the original implementation, but at a lower implementation cost. TMB can also calculate model and prediction uncertainty, by including estimation uncertainty from both latent variables and the hyperparameters. In conclusion, we find that TMB is a flexible tool that facilitates implementation and prototyping of sparse Bayesian methods.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"39 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new maximum mean discrepancy based two-sample test for equal distributions in separable metric spaces","authors":"Bu Zhou, Zhi Peng Ong, Jin-Ting Zhang","doi":"10.1007/s11222-024-10483-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10483-9","url":null,"abstract":"<p>This paper presents a novel two-sample test for equal distributions in separable metric spaces, utilizing the maximum mean discrepancy (MMD). The test statistic is derived from the decomposition of the total variation of data in the reproducing kernel Hilbert space, and can be regarded as a V-statistic-based estimator of the squared MMD. The paper establishes the asymptotic null and alternative distributions of the test statistic. To approximate the null distribution accurately, a three-cumulant matched chi-squared approximation method is employed. The parameters for this approximation are consistently estimated from the data. Additionally, the paper introduces a new data-adaptive method based on the median absolute deviation to select the kernel width of the Gaussian kernel, and a new permutation test combining two different Gaussian kernel width selection methods, which improve the adaptability of the test to different data sets. Fast implementation of the test using matrix calculation is discussed. Extensive simulation studies and three real data examples are presented to demonstrate the good performance of the proposed test.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"3 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wasserstein principal component analysis for circular measures","authors":"Mario Beraha, Matteo Pegoraro","doi":"10.1007/s11222-024-10473-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10473-x","url":null,"abstract":"<p>We consider the 2-Wasserstein space of probability measures supported on the unit-circle, and propose a framework for Principal Component Analysis (PCA) for data living in such a space. We build on a detailed investigation of the optimal transportation problem for measures on the unit-circle which might be of independent interest. In particular, building on previously obtained results, we derive an expression for optimal transport maps in (almost) closed form and propose an alternative definition of the tangent space at an absolutely continuous probability measure, together with fundamental characterizations of the associated exponential and logarithmic maps. PCA is performed by mapping data on the tangent space at the Wasserstein barycentre, which we approximate via an iterative scheme, and for which we establish a sufficient a posteriori condition to assess its convergence. Our methodology is illustrated on several simulated scenarios and a real data analysis of measurements of optical nerve thickness.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"12 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Individualized causal mediation analysis with continuous treatment using conditional generative adversarial networks","authors":"Cheng Huan, Xinyuan Song, Hongwei Yuan","doi":"10.1007/s11222-024-10484-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10484-8","url":null,"abstract":"<p>Traditional methods used in causal mediation analysis with continuous treatment often focus on estimating average causal effects, limiting their applicability in precision medicine. Machine learning techniques have emerged as a powerful approach for precisely estimating individualized causal effects. This paper proposes a novel method called CGAN-ICMA-CT that leverages Conditional Generative Adversarial Networks (CGANs) to infer individualized causal effects with continuous treatment. We thoroughly investigate the convergence properties of CGAN-ICMA-CT and show that the estimated distribution of our inferential conditional generator converges to the true conditional distribution under mild conditions. We conduct numerical experiments to validate the effectiveness of CGAN-ICMA-CT and compare it with four commonly used methods: linear regression, support vector machine regression, decision tree, and random forest regression. The results demonstrate that CGAN-ICMA-CT outperforms these methods regarding accuracy and precision. Furthermore, we apply the CGAN-ICMA-CT model to the real-world Job Corps dataset, showcasing its practical utility. By utilizing CGAN-ICMA-CT, we estimate the individualized causal effects of the Job Corps program on the number of arrests, providing insights into both direct effects and effects mediated through intermediate variables. Our findings confirm the potential of CGAN-ICMA-CT in advancing individualized causal mediation analysis with continuous treatment in precision medicine settings.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"7 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simon Pfahler, Peter Georg, Rudolf Schill, Maren Klever, Lars Grasedyck, Rainer Spang, Tilo Wettig
{"title":"Taming numerical imprecision by adapting the KL divergence to negative probabilities","authors":"Simon Pfahler, Peter Georg, Rudolf Schill, Maren Klever, Lars Grasedyck, Rainer Spang, Tilo Wettig","doi":"10.1007/s11222-024-10480-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10480-y","url":null,"abstract":"<p>The Kullback–Leibler (KL) divergence is frequently used in data science. For discrete distributions on large state spaces, approximations of probability vectors may result in a few small negative entries, rendering the KL divergence undefined. We address this problem by introducing a parameterized family of substitute divergence measures, the shifted KL (sKL) divergence measures. Our approach is generic and does not increase the computational overhead. We show that the sKL divergence shares important theoretical properties with the KL divergence and discuss how its shift parameters should be chosen. If Gaussian noise is added to a probability vector, we prove that the average sKL divergence converges to the KL divergence for small enough noise. We also show that our method solves the problem of negative entries in an application from computational oncology, the optimization of Mutual Hazard Networks for cancer progression using tensor-train approximations.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"185 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anne Poot, Pierre Kerfriden, Iuri Rocha, Frans van der Meer
{"title":"A Bayesian approach to modeling finite element discretization error","authors":"Anne Poot, Pierre Kerfriden, Iuri Rocha, Frans van der Meer","doi":"10.1007/s11222-024-10463-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10463-z","url":null,"abstract":"<p>In this work, the uncertainty associated with the finite element discretization error is modeled following the Bayesian paradigm. First, a continuous formulation is derived, where a Gaussian process prior over the solution space is updated based on observations from a finite element discretization. To avoid the computation of intractable integrals, a second, finer, discretization is introduced that is assumed sufficiently dense to represent the true solution field. A prior distribution is assumed over the fine discretization, which is then updated based on observations from the coarse discretization. This yields a posterior distribution with a mean that serves as an estimate of the solution, and a covariance that models the uncertainty associated with this estimate. Two particular choices of prior are investigated: a prior defined implicitly by assigning a white noise distribution to the right-hand side term, and a prior whose covariance function is equal to the Green’s function of the partial differential equation. The former yields a posterior distribution with a mean close to the reference solution, but a covariance that contains little information regarding the finite element discretization error. The latter, on the other hand, yields posterior distribution with a mean equal to the coarse finite element solution, and a covariance with a close connection to the discretization error. For both choices of prior a contradiction arises, since the discretization error depends on the right-hand side term, but the posterior covariance does not. We demonstrate how, by rescaling the eigenvalues of the posterior covariance, this independence can be avoided.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna De Magistris, Valentina De Simone, Elvira Romano, Gerardo Toraldo
{"title":"Roughness regularization for functional data analysis with free knots spline estimation","authors":"Anna De Magistris, Valentina De Simone, Elvira Romano, Gerardo Toraldo","doi":"10.1007/s11222-024-10474-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10474-w","url":null,"abstract":"<p>In the era of big data, an ever-growing volume of information is recorded, either continuously over time or sporadically, at distinct time intervals. Functional Data Analysis (FDA) stands at the cutting edge of this data revolution, offering a powerful framework for handling and extracting meaningful insights from such complex datasets. The currently proposed FDA methods can often encounter challenges, especially when dealing with curves of varying shapes. This can largely be attributed to the method’s strong dependence on data approximation as a key aspect of the analysis process. In this work, we propose a free knots spline estimation method for functional data with two penalty terms and demonstrate its performance by comparing the results of several clustering methods on simulated and real data.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"75 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning variational autoencoders via MCMC speed measures","authors":"Marcel Hirt, Vasileios Kreouzis, Petros Dellaportas","doi":"10.1007/s11222-024-10481-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10481-x","url":null,"abstract":"<p>Variational autoencoders (VAEs) are popular likelihood-based generative models which can be efficiently trained by maximising an evidence lower bound. There has been much progress in improving the expressiveness of the variational distribution to obtain tighter variational bounds and increased generative performance. Whilst previous work has leveraged Markov chain Monte Carlo methods for constructing variational densities, gradient-based methods for adapting the proposal distributions for deep latent variable models have received less attention. This work suggests an entropy-based adaptation for a short-run metropolis-adjusted Langevin or Hamiltonian Monte Carlo (HMC) chain while optimising a tighter variational bound to the log-evidence. Experiments show that this approach yields higher held-out log-likelihoods as well as improved generative metrics. Our implicit variational density can adapt to complicated posterior geometries of latent hierarchical representations arising in hierarchical VAEs.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"130 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The COR criterion for optimal subset selection in distributed estimation","authors":"Guangbao Guo, Haoyue Song, Lixing Zhu","doi":"10.1007/s11222-024-10471-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10471-z","url":null,"abstract":"<p>The problem of selecting an optimal subset in distributed regression is a crucial issue, as each distributed data subset may contain redundant information, which can be attributed to various sources such as outliers, dispersion, inconsistent duplicates, too many independent variables, and excessive data points, among others. Efficient reduction and elimination of this redundancy can help alleviate inconsistency issues for statistical inference. Therefore, it is imperative to track redundancy while measuring and processing data. We develop a criterion for optimal subset selection that is related to Covariance matrices, Observation matrices, and Response vectors (COR). We also derive a novel distributed interval estimation for the proposed criterion and establish the existence of optimal subset length. Finally, numerical experiments are conducted to verify the experimental feasibility of the proposed criterion.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"53 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}