Statistics and Computing最新文献

筛选
英文 中文
Sparse Bayesian learning using TMB (Template Model Builder) 使用 TMB(模板模型生成器)进行稀疏贝叶斯学习
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-28 DOI: 10.1007/s11222-024-10476-8
Ingvild M. Helgøy, Hans J. Skaug, Yushu Li
{"title":"Sparse Bayesian learning using TMB (Template Model Builder)","authors":"Ingvild M. Helgøy, Hans J. Skaug, Yushu Li","doi":"10.1007/s11222-024-10476-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10476-8","url":null,"abstract":"<p>Sparse Bayesian Learning, and more specifically the Relevance Vector Machine (RVM), can be used in supervised learning for both classification and regression problems. Such methods are particularly useful when applied to big data in order to find a sparse (in weight space) representation of the model. This paper demonstrates that the Template Model Builder (TMB) is an accurate and flexible computational framework for implementation of sparse Bayesian learning methods.The user of TMB is only required to specify the joint likelihood of the weights and the data, while the Laplace approximation of the marginal likelihood is automatically evaluated to numerical precision. This approximation is in turn used to estimate hyperparameters by maximum marginal likelihood. In order to reduce the computational cost of the Laplace approximation we introduce the notion of an “active set” of weights, and we devise an algorithm for dynamically updating this set until convergence, similar to what is done in other RVM type methods. We implement two different methods using TMB; the RVM and the Probabilistic Feature Selection and Classification Vector Machine method, where the latter also performs feature selection. Experiments based on benchmark data show that our TMB implementation performs comparable to that of the original implementation, but at a lower implementation cost. TMB can also calculate model and prediction uncertainty, by including estimation uncertainty from both latent variables and the hyperparameters. In conclusion, we find that TMB is a flexible tool that facilitates implementation and prototyping of sparse Bayesian methods.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new maximum mean discrepancy based two-sample test for equal distributions in separable metric spaces 基于最大均值差异的新的可分离度量空间等分布双样本检验法
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-25 DOI: 10.1007/s11222-024-10483-9
Bu Zhou, Zhi Peng Ong, Jin-Ting Zhang
{"title":"A new maximum mean discrepancy based two-sample test for equal distributions in separable metric spaces","authors":"Bu Zhou, Zhi Peng Ong, Jin-Ting Zhang","doi":"10.1007/s11222-024-10483-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10483-9","url":null,"abstract":"<p>This paper presents a novel two-sample test for equal distributions in separable metric spaces, utilizing the maximum mean discrepancy (MMD). The test statistic is derived from the decomposition of the total variation of data in the reproducing kernel Hilbert space, and can be regarded as a V-statistic-based estimator of the squared MMD. The paper establishes the asymptotic null and alternative distributions of the test statistic. To approximate the null distribution accurately, a three-cumulant matched chi-squared approximation method is employed. The parameters for this approximation are consistently estimated from the data. Additionally, the paper introduces a new data-adaptive method based on the median absolute deviation to select the kernel width of the Gaussian kernel, and a new permutation test combining two different Gaussian kernel width selection methods, which improve the adaptability of the test to different data sets. Fast implementation of the test using matrix calculation is discussed. Extensive simulation studies and three real data examples are presented to demonstrate the good performance of the proposed test.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wasserstein principal component analysis for circular measures 用于循环测量的瓦瑟斯坦主成分分析法
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-24 DOI: 10.1007/s11222-024-10473-x
Mario Beraha, Matteo Pegoraro
{"title":"Wasserstein principal component analysis for circular measures","authors":"Mario Beraha, Matteo Pegoraro","doi":"10.1007/s11222-024-10473-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10473-x","url":null,"abstract":"<p>We consider the 2-Wasserstein space of probability measures supported on the unit-circle, and propose a framework for Principal Component Analysis (PCA) for data living in such a space. We build on a detailed investigation of the optimal transportation problem for measures on the unit-circle which might be of independent interest. In particular, building on previously obtained results, we derive an expression for optimal transport maps in (almost) closed form and propose an alternative definition of the tangent space at an absolutely continuous probability measure, together with fundamental characterizations of the associated exponential and logarithmic maps. PCA is performed by mapping data on the tangent space at the Wasserstein barycentre, which we approximate via an iterative scheme, and for which we establish a sufficient a posteriori condition to assess its convergence. Our methodology is illustrated on several simulated scenarios and a real data analysis of measurements of optical nerve thickness.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Individualized causal mediation analysis with continuous treatment using conditional generative adversarial networks 利用条件生成对抗网络对连续治疗进行个性化因果中介分析
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-23 DOI: 10.1007/s11222-024-10484-8
Cheng Huan, Xinyuan Song, Hongwei Yuan
{"title":"Individualized causal mediation analysis with continuous treatment using conditional generative adversarial networks","authors":"Cheng Huan, Xinyuan Song, Hongwei Yuan","doi":"10.1007/s11222-024-10484-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10484-8","url":null,"abstract":"<p>Traditional methods used in causal mediation analysis with continuous treatment often focus on estimating average causal effects, limiting their applicability in precision medicine. Machine learning techniques have emerged as a powerful approach for precisely estimating individualized causal effects. This paper proposes a novel method called CGAN-ICMA-CT that leverages Conditional Generative Adversarial Networks (CGANs) to infer individualized causal effects with continuous treatment. We thoroughly investigate the convergence properties of CGAN-ICMA-CT and show that the estimated distribution of our inferential conditional generator converges to the true conditional distribution under mild conditions. We conduct numerical experiments to validate the effectiveness of CGAN-ICMA-CT and compare it with four commonly used methods: linear regression, support vector machine regression, decision tree, and random forest regression. The results demonstrate that CGAN-ICMA-CT outperforms these methods regarding accuracy and precision. Furthermore, we apply the CGAN-ICMA-CT model to the real-world Job Corps dataset, showcasing its practical utility. By utilizing CGAN-ICMA-CT, we estimate the individualized causal effects of the Job Corps program on the number of arrests, providing insights into both direct effects and effects mediated through intermediate variables. Our findings confirm the potential of CGAN-ICMA-CT in advancing individualized causal mediation analysis with continuous treatment in precision medicine settings.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Taming numerical imprecision by adapting the KL divergence to negative probabilities 通过调整 KL 分歧以适应负概率来控制数值不精确性
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-13 DOI: 10.1007/s11222-024-10480-y
Simon Pfahler, Peter Georg, Rudolf Schill, Maren Klever, Lars Grasedyck, Rainer Spang, Tilo Wettig
{"title":"Taming numerical imprecision by adapting the KL divergence to negative probabilities","authors":"Simon Pfahler, Peter Georg, Rudolf Schill, Maren Klever, Lars Grasedyck, Rainer Spang, Tilo Wettig","doi":"10.1007/s11222-024-10480-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10480-y","url":null,"abstract":"<p>The Kullback–Leibler (KL) divergence is frequently used in data science. For discrete distributions on large state spaces, approximations of probability vectors may result in a few small negative entries, rendering the KL divergence undefined. We address this problem by introducing a parameterized family of substitute divergence measures, the shifted KL (sKL) divergence measures. Our approach is generic and does not increase the computational overhead. We show that the sKL divergence shares important theoretical properties with the KL divergence and discuss how its shift parameters should be chosen. If Gaussian noise is added to a probability vector, we prove that the average sKL divergence converges to the KL divergence for small enough noise. We also show that our method solves the problem of negative entries in an application from computational oncology, the optimization of Mutual Hazard Networks for cancer progression using tensor-train approximations.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian approach to modeling finite element discretization error 有限元离散化误差建模的贝叶斯方法
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-09 DOI: 10.1007/s11222-024-10463-z
Anne Poot, Pierre Kerfriden, Iuri Rocha, Frans van der Meer
{"title":"A Bayesian approach to modeling finite element discretization error","authors":"Anne Poot, Pierre Kerfriden, Iuri Rocha, Frans van der Meer","doi":"10.1007/s11222-024-10463-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10463-z","url":null,"abstract":"<p>In this work, the uncertainty associated with the finite element discretization error is modeled following the Bayesian paradigm. First, a continuous formulation is derived, where a Gaussian process prior over the solution space is updated based on observations from a finite element discretization. To avoid the computation of intractable integrals, a second, finer, discretization is introduced that is assumed sufficiently dense to represent the true solution field. A prior distribution is assumed over the fine discretization, which is then updated based on observations from the coarse discretization. This yields a posterior distribution with a mean that serves as an estimate of the solution, and a covariance that models the uncertainty associated with this estimate. Two particular choices of prior are investigated: a prior defined implicitly by assigning a white noise distribution to the right-hand side term, and a prior whose covariance function is equal to the Green’s function of the partial differential equation. The former yields a posterior distribution with a mean close to the reference solution, but a covariance that contains little information regarding the finite element discretization error. The latter, on the other hand, yields posterior distribution with a mean equal to the coarse finite element solution, and a covariance with a close connection to the discretization error. For both choices of prior a contradiction arises, since the discretization error depends on the right-hand side term, but the posterior covariance does not. We demonstrate how, by rescaling the eigenvalues of the posterior covariance, this independence can be avoided.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AR-ADASYN: angle radius-adaptive synthetic data generation approach for imbalanced learning AR-ADASYN:用于不平衡学习的角度半径自适应合成数据生成方法
IF 1.6 2区 数学
Statistics and Computing Pub Date : 2024-08-08 DOI: 10.1007/s11222-024-10479-5
Hyejoon Park, Hyunjoong Kim
{"title":"AR-ADASYN: angle radius-adaptive synthetic data generation approach for imbalanced learning","authors":"Hyejoon Park, Hyunjoong Kim","doi":"10.1007/s11222-024-10479-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10479-5","url":null,"abstract":"","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141928985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Roughness regularization for functional data analysis with free knots spline estimation 利用自由结样条估计进行函数数据分析的粗糙度正则化
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-08 DOI: 10.1007/s11222-024-10474-w
Anna De Magistris, Valentina De Simone, Elvira Romano, Gerardo Toraldo
{"title":"Roughness regularization for functional data analysis with free knots spline estimation","authors":"Anna De Magistris, Valentina De Simone, Elvira Romano, Gerardo Toraldo","doi":"10.1007/s11222-024-10474-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10474-w","url":null,"abstract":"<p>In the era of big data, an ever-growing volume of information is recorded, either continuously over time or sporadically, at distinct time intervals. Functional Data Analysis (FDA) stands at the cutting edge of this data revolution, offering a powerful framework for handling and extracting meaningful insights from such complex datasets. The currently proposed FDA methods can often encounter challenges, especially when dealing with curves of varying shapes. This can largely be attributed to the method’s strong dependence on data approximation as a key aspect of the analysis process. In this work, we propose a free knots spline estimation method for functional data with two penalty terms and demonstrate its performance by comparing the results of several clustering methods on simulated and real data.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning variational autoencoders via MCMC speed measures 通过 MCMC 速度测量学习变分自编码器
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-06 DOI: 10.1007/s11222-024-10481-x
Marcel Hirt, Vasileios Kreouzis, Petros Dellaportas
{"title":"Learning variational autoencoders via MCMC speed measures","authors":"Marcel Hirt, Vasileios Kreouzis, Petros Dellaportas","doi":"10.1007/s11222-024-10481-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10481-x","url":null,"abstract":"<p>Variational autoencoders (VAEs) are popular likelihood-based generative models which can be efficiently trained by maximising an evidence lower bound. There has been much progress in improving the expressiveness of the variational distribution to obtain tighter variational bounds and increased generative performance. Whilst previous work has leveraged Markov chain Monte Carlo methods for constructing variational densities, gradient-based methods for adapting the proposal distributions for deep latent variable models have received less attention. This work suggests an entropy-based adaptation for a short-run metropolis-adjusted Langevin or Hamiltonian Monte Carlo (HMC) chain while optimising a tighter variational bound to the log-evidence. Experiments show that this approach yields higher held-out log-likelihoods as well as improved generative metrics. Our implicit variational density can adapt to complicated posterior geometries of latent hierarchical representations arising in hierarchical VAEs.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The COR criterion for optimal subset selection in distributed estimation 分布式估算中最优子集选择的 COR 准则
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-02 DOI: 10.1007/s11222-024-10471-z
Guangbao Guo, Haoyue Song, Lixing Zhu
{"title":"The COR criterion for optimal subset selection in distributed estimation","authors":"Guangbao Guo, Haoyue Song, Lixing Zhu","doi":"10.1007/s11222-024-10471-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10471-z","url":null,"abstract":"<p>The problem of selecting an optimal subset in distributed regression is a crucial issue, as each distributed data subset may contain redundant information, which can be attributed to various sources such as outliers, dispersion, inconsistent duplicates, too many independent variables, and excessive data points, among others. Efficient reduction and elimination of this redundancy can help alleviate inconsistency issues for statistical inference. Therefore, it is imperative to track redundancy while measuring and processing data. We develop a criterion for optimal subset selection that is related to Covariance matrices, Observation matrices, and Response vectors (COR). We also derive a novel distributed interval estimation for the proposed criterion and establish the existence of optimal subset length. Finally, numerical experiments are conducted to verify the experimental feasibility of the proposed criterion.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信