Statistics and Computing最新文献_第3页

A comprehensive comparison of goodness-of-fit tests for logistic regression models 逻辑回归模型拟合优度检验的综合比较

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-08-30 DOI: 10.1007/s11222-024-10487-5

Huiling Liu, Xinmin Li, Feifei Chen, Wolfgang Härdle, Hua Liang

{"title":"A comprehensive comparison of goodness-of-fit tests for logistic regression models","authors":"Huiling Liu, Xinmin Li, Feifei Chen, Wolfgang Härdle, Hua Liang","doi":"10.1007/s11222-024-10487-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10487-5","url":null,"abstract":"We introduce a projection-based test for assessing logistic regression models using the empirical residual marked empirical process and suggest a model-based bootstrap procedure to calculate critical values. We comprehensively compare this test and Stute and Zhu’s test with several commonly used goodness-of-fit (GoF) tests: the Hosmer–Lemeshow test, modified Hosmer–Lemeshow test, Osius–Rojek test, and Stukel test for logistic regression models in terms of type I error control and power performance in small ((n=50)), moderate ((n=100)), and large ((n=500)) sample sizes. We assess the power performance for two commonly encountered situations: nonlinear and interaction departures from the null hypothesis. All tests except the modified Hosmer–Lemeshow test and Osius–Rojek test have the correct size in all sample sizes. The power performance of the projection based test consistently outperforms its competitors. We apply these tests to analyze an AIDS dataset and a cancer dataset. For the former, all tests except the projection-based test do not reject a simple linear function in the logit, which has been illustrated to be deficient in the literature. For the latter dataset, the Hosmer–Lemeshow test, modified Hosmer–Lemeshow test, and Osius–Rojek test fail to detect the quadratic form in the logit, which was detected by the Stukel test, Stute and Zhu’s test, and the projection-based test.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"4 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

New forest-based approaches for sufficient dimension reduction 基于森林的充分降维新方法

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-08-30 DOI: 10.1007/s11222-024-10482-w

Shuang Dai, Ping Wu, Zhou Yu

{"title":"New forest-based approaches for sufficient dimension reduction","authors":"Shuang Dai, Ping Wu, Zhou Yu","doi":"10.1007/s11222-024-10482-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10482-w","url":null,"abstract":"Sufficient dimension reduction (SDR) primarily aims to reduce the dimensionality of high-dimensional predictor variables while retaining essential information about the responses. Traditional SDR methods typically employ kernel weighting functions, which unfortunately makes them susceptible to the curse of dimensionality. To address this issue, we in this paper propose novel forest-based approaches for SDR that utilize a locally adaptive kernel generated by Mondrian forests. Overall, our work takes the perspective of Mondrian forest as an adaptive weighted kernel technique for SDR problems. In the central mean subspace model, by integrating the methods from Xia et al. (J R Stat Soc Ser B (Stat Methodol) 64(3):363–410, 2002. https://doi.org/10.1111/1467-9868.03411) with Mondrian forest weights, we suggest the forest-based outer product of gradients estimation (mf-OPG) and the forest-based minimum average variance estimation (mf-MAVE). Moreover, we substitute the kernels used in nonparametric density function estimations (Xia in Ann Stat 35(6):2654–2690, 2007. https://doi.org/10.1214/009053607000000352), targeting the central subspace, with Mondrian forest weights. These techniques are referred to as mf-dOPG and mf-dMAVE, respectively. Under regularity conditions, we establish the asymptotic properties of our forest-based estimators, as well as the convergence of the affiliated algorithms. Through simulation studies and analysis of fully observable data, we demonstrate substantial improvements in computational efficiency and predictive accuracy of our proposals compared with the traditional counterparts.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"57 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SB-ETAS: using simulation based inference for scalable, likelihood-free inference for the ETAS model of earthquake occurrences SB-ETAS：使用基于模拟的推理方法对地震发生的 ETAS 模型进行可扩展的无似然推理

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-08-29 DOI: 10.1007/s11222-024-10486-6

Samuel Stockman, Daniel J. Lawson, Maximilian J. Werner

{"title":"SB-ETAS: using simulation based inference for scalable, likelihood-free inference for the ETAS model of earthquake occurrences","authors":"Samuel Stockman, Daniel J. Lawson, Maximilian J. Werner","doi":"10.1007/s11222-024-10486-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10486-6","url":null,"abstract":"The rapid growth of earthquake catalogs, driven by machine learning-based phase picking and denser seismic networks, calls for the application of a broader range of models to determine whether the new data enhances earthquake forecasting capabilities. Additionally, this growth demands that existing forecasting models efficiently scale to handle the increased data volume. Approximate inference methods such as inlabru, which is based on the Integrated nested Laplace approximation, offer improved computational efficiencies and the ability to perform inference on more complex point-process models compared to traditional MCMC approaches. We present SB-ETAS: a simulation based inference procedure for the epidemic-type aftershock sequence (ETAS) model. This approximate Bayesian method uses sequential neural posterior estimation (SNPE) to learn posterior distributions from simulations, rather than typical MCMC sampling using the likelihood. On synthetic earthquake catalogs, SB-ETAS provides better coverage of ETAS posterior distributions compared with inlabru. Furthermore, we demonstrate that using a simulation based procedure for inference improves the scalability from (mathcal {O}(n^2)) to (mathcal {O}(nlog n)). This makes it feasible to fit to very large earthquake catalogs, such as one for Southern California dating back to 1981. SB-ETAS can find Bayesian estimates of ETAS parameters for this catalog in less than 10 h on a standard laptop, a task that would have taken over 2 weeks using MCMC. Beyond the standard ETAS model, this simulation based framework allows earthquake modellers to define and infer parameters for much more complex models by removing the need to define a likelihood function.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"22 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sparse Bayesian learning using TMB (Template Model Builder) 使用 TMB（模板模型生成器）进行稀疏贝叶斯学习

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-08-28 DOI: 10.1007/s11222-024-10476-8

Ingvild M. Helgøy, Hans J. Skaug, Yushu Li

{"title":"Sparse Bayesian learning using TMB (Template Model Builder)","authors":"Ingvild M. Helgøy, Hans J. Skaug, Yushu Li","doi":"10.1007/s11222-024-10476-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10476-8","url":null,"abstract":"Sparse Bayesian Learning, and more specifically the Relevance Vector Machine (RVM), can be used in supervised learning for both classification and regression problems. Such methods are particularly useful when applied to big data in order to find a sparse (in weight space) representation of the model. This paper demonstrates that the Template Model Builder (TMB) is an accurate and flexible computational framework for implementation of sparse Bayesian learning methods.The user of TMB is only required to specify the joint likelihood of the weights and the data, while the Laplace approximation of the marginal likelihood is automatically evaluated to numerical precision. This approximation is in turn used to estimate hyperparameters by maximum marginal likelihood. In order to reduce the computational cost of the Laplace approximation we introduce the notion of an “active set” of weights, and we devise an algorithm for dynamically updating this set until convergence, similar to what is done in other RVM type methods. We implement two different methods using TMB; the RVM and the Probabilistic Feature Selection and Classification Vector Machine method, where the latter also performs feature selection. Experiments based on benchmark data show that our TMB implementation performs comparable to that of the original implementation, but at a lower implementation cost. TMB can also calculate model and prediction uncertainty, by including estimation uncertainty from both latent variables and the hyperparameters. In conclusion, we find that TMB is a flexible tool that facilitates implementation and prototyping of sparse Bayesian methods.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"39 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new maximum mean discrepancy based two-sample test for equal distributions in separable metric spaces 基于最大均值差异的新的可分离度量空间等分布双样本检验法

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-08-25 DOI: 10.1007/s11222-024-10483-9

Bu Zhou, Zhi Peng Ong, Jin-Ting Zhang

{"title":"A new maximum mean discrepancy based two-sample test for equal distributions in separable metric spaces","authors":"Bu Zhou, Zhi Peng Ong, Jin-Ting Zhang","doi":"10.1007/s11222-024-10483-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10483-9","url":null,"abstract":"This paper presents a novel two-sample test for equal distributions in separable metric spaces, utilizing the maximum mean discrepancy (MMD). The test statistic is derived from the decomposition of the total variation of data in the reproducing kernel Hilbert space, and can be regarded as a V-statistic-based estimator of the squared MMD. The paper establishes the asymptotic null and alternative distributions of the test statistic. To approximate the null distribution accurately, a three-cumulant matched chi-squared approximation method is employed. The parameters for this approximation are consistently estimated from the data. Additionally, the paper introduces a new data-adaptive method based on the median absolute deviation to select the kernel width of the Gaussian kernel, and a new permutation test combining two different Gaussian kernel width selection methods, which improve the adaptability of the test to different data sets. Fast implementation of the test using matrix calculation is discussed. Extensive simulation studies and three real data examples are presented to demonstrate the good performance of the proposed test.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"3 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wasserstein principal component analysis for circular measures 用于循环测量的瓦瑟斯坦主成分分析法

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-08-24 DOI: 10.1007/s11222-024-10473-x

Mario Beraha, Matteo Pegoraro

引用次数: 0

Individualized causal mediation analysis with continuous treatment using conditional generative adversarial networks 利用条件生成对抗网络对连续治疗进行个性化因果中介分析

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-08-23 DOI: 10.1007/s11222-024-10484-8

Cheng Huan, Xinyuan Song, Hongwei Yuan

{"title":"Individualized causal mediation analysis with continuous treatment using conditional generative adversarial networks","authors":"Cheng Huan, Xinyuan Song, Hongwei Yuan","doi":"10.1007/s11222-024-10484-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10484-8","url":null,"abstract":"Traditional methods used in causal mediation analysis with continuous treatment often focus on estimating average causal effects, limiting their applicability in precision medicine. Machine learning techniques have emerged as a powerful approach for precisely estimating individualized causal effects. This paper proposes a novel method called CGAN-ICMA-CT that leverages Conditional Generative Adversarial Networks (CGANs) to infer individualized causal effects with continuous treatment. We thoroughly investigate the convergence properties of CGAN-ICMA-CT and show that the estimated distribution of our inferential conditional generator converges to the true conditional distribution under mild conditions. We conduct numerical experiments to validate the effectiveness of CGAN-ICMA-CT and compare it with four commonly used methods: linear regression, support vector machine regression, decision tree, and random forest regression. The results demonstrate that CGAN-ICMA-CT outperforms these methods regarding accuracy and precision. Furthermore, we apply the CGAN-ICMA-CT model to the real-world Job Corps dataset, showcasing its practical utility. By utilizing CGAN-ICMA-CT, we estimate the individualized causal effects of the Job Corps program on the number of arrests, providing insights into both direct effects and effects mediated through intermediate variables. Our findings confirm the potential of CGAN-ICMA-CT in advancing individualized causal mediation analysis with continuous treatment in precision medicine settings.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"7 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Taming numerical imprecision by adapting the KL divergence to negative probabilities 通过调整 KL 分歧以适应负概率来控制数值不精确性

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-08-13 DOI: 10.1007/s11222-024-10480-y

Simon Pfahler, Peter Georg, Rudolf Schill, Maren Klever, Lars Grasedyck, Rainer Spang, Tilo Wettig

引用次数: 0

A Bayesian approach to modeling finite element discretization error 有限元离散化误差建模的贝叶斯方法

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-08-09 DOI: 10.1007/s11222-024-10463-z

Anne Poot, Pierre Kerfriden, Iuri Rocha, Frans van der Meer

{"title":"A Bayesian approach to modeling finite element discretization error","authors":"Anne Poot, Pierre Kerfriden, Iuri Rocha, Frans van der Meer","doi":"10.1007/s11222-024-10463-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10463-z","url":null,"abstract":"In this work, the uncertainty associated with the finite element discretization error is modeled following the Bayesian paradigm. First, a continuous formulation is derived, where a Gaussian process prior over the solution space is updated based on observations from a finite element discretization. To avoid the computation of intractable integrals, a second, finer, discretization is introduced that is assumed sufficiently dense to represent the true solution field. A prior distribution is assumed over the fine discretization, which is then updated based on observations from the coarse discretization. This yields a posterior distribution with a mean that serves as an estimate of the solution, and a covariance that models the uncertainty associated with this estimate. Two particular choices of prior are investigated: a prior defined implicitly by assigning a white noise distribution to the right-hand side term, and a prior whose covariance function is equal to the Green’s function of the partial differential equation. The former yields a posterior distribution with a mean close to the reference solution, but a covariance that contains little information regarding the finite element discretization error. The latter, on the other hand, yields posterior distribution with a mean equal to the coarse finite element solution, and a covariance with a close connection to the discretization error. For both choices of prior a contradiction arises, since the discretization error depends on the right-hand side term, but the posterior covariance does not. We demonstrate how, by rescaling the eigenvalues of the posterior covariance, this independence can be avoided.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Roughness regularization for functional data analysis with free knots spline estimation 利用自由结样条估计进行函数数据分析的粗糙度正则化

IF 2.2 2区数学

Statistics and Computing Pub Date : 2024-08-08 DOI: 10.1007/s11222-024-10474-w

Anna De Magistris, Valentina De Simone, Elvira Romano, Gerardo Toraldo

引用次数: 0