Computational Statistics & Data Analysis最新文献

筛选
英文 中文
Random effects misspecification and its consequences for prediction in generalized linear mixed models 广义线性混合模型中的随机效应、错配及其预测后果
IF 1.6 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-07-29 DOI: 10.1016/j.csda.2025.108254
Quan Vu , Francis K.C. Hui , Samuel Muller , A.H. Welsh
{"title":"Random effects misspecification and its consequences for prediction in generalized linear mixed models","authors":"Quan Vu ,&nbsp;Francis K.C. Hui ,&nbsp;Samuel Muller ,&nbsp;A.H. Welsh","doi":"10.1016/j.csda.2025.108254","DOIUrl":"10.1016/j.csda.2025.108254","url":null,"abstract":"<div><div>When fitting generalized linear mixed models, choosing the random effects distribution is an important decision. As random effects are unobserved, misspecification of their distribution is a real possibility. Thus, the consequences of random effects misspecification for point prediction and prediction inference of random effects in generalized linear mixed models need to be investigated. A combination of theory, simulation, and a real application is used to explore the effect of using the common normality assumption for the random effects distribution when the correct specification is a mixture of normal distributions, focusing on the impacts on point prediction, mean squared prediction errors, and prediction intervals. Results show that the level of shrinkage for the predicted random effects can differ greatly under the two random effect distributions, and so is susceptible to misspecification. Also, the unconditional mean squared prediction errors for the random effects are almost always larger under the misspecified normal random effects distribution, while results for the mean squared prediction errors conditional on the random effects are more complicated but remain generally larger under the misspecified distribution (especially when the true random effect is close to the mean of one of the component distributions in the true mixture distribution). Results for prediction intervals indicate that the overall coverage probability is, in contrast, not greatly impacted by misspecification. It is concluded that misspecifying the random effects distribution can affect prediction of random effects, and greater caution is recommended when adopting the normality assumption in generalized linear mixed models.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108254"},"PeriodicalIF":1.6,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144738685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian optimization sequential surrogate (BOSS) algorithm: Fast Bayesian inference for a broad class of Bayesian hierarchical models 贝叶斯优化顺序代理(BOSS)算法:针对广泛的贝叶斯层次模型的快速贝叶斯推理
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-07-23 DOI: 10.1016/j.csda.2025.108253
Dayi Li , Ziang Zhang
{"title":"Bayesian optimization sequential surrogate (BOSS) algorithm: Fast Bayesian inference for a broad class of Bayesian hierarchical models","authors":"Dayi Li ,&nbsp;Ziang Zhang","doi":"10.1016/j.csda.2025.108253","DOIUrl":"10.1016/j.csda.2025.108253","url":null,"abstract":"<div><div>Approximate Bayesian inference based on Laplace approximation and quadrature has become increasingly popular for its efficiency in fitting latent Gaussian models (LGM). However, many useful models can only be fitted as LGMs if some conditioning parameters are fixed. Such models are termed conditional LGMs, with examples including change-point detection, non-linear regression, and many others. Existing methods for fitting conditional LGMs rely on grid search or sampling-based approaches to explore the posterior density of the conditioning parameters; both require a large number of evaluations of the unnormalized posterior density of the conditioning parameters. Since each evaluation requires fitting a separate LGM, these methods become computationally prohibitive beyond simple scenarios. In this work, the Bayesian Optimization Sequential Surrogate (BOSS) algorithm is introduced, which combines Bayesian optimization with approximate Bayesian inference methods to significantly reduce the computational resources required for fitting conditional LGMs. With orders of magnitude fewer evaluations than those required by the existing methods, BOSS efficiently generates sequential design points that capture the majority of the posterior mass of the conditioning parameters and subsequently yields an accurate surrogate posterior distribution that can be easily normalized. The efficiency, accuracy, and practical utility of BOSS are demonstrated through extensive simulation studies and real-world applications in epidemiology, environmental sciences, and astrophysics.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108253"},"PeriodicalIF":1.5,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GMM estimation of fixed effects partially linear additive SAR model with space-time correlated disturbances 具有时空相关扰动的部分线性可加SAR模型的固定效应GMM估计
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-07-22 DOI: 10.1016/j.csda.2025.108252
Bogui Li , Jianbao Chen
{"title":"GMM estimation of fixed effects partially linear additive SAR model with space-time correlated disturbances","authors":"Bogui Li ,&nbsp;Jianbao Chen","doi":"10.1016/j.csda.2025.108252","DOIUrl":"10.1016/j.csda.2025.108252","url":null,"abstract":"<div><div>In order to study the ubiquitous space-time panel data in real world, a fixed effects partially linear additive spatial autoregressive (SAR) model with space-time correlated disturbances is proposed. Compared to the linear panel model with space-time correlated disturbances, it can simultaneously capture substantial spatial dependence of response, linearity and nonlinearity between response and regressors, spatial and serial correlations of disturbances, and avoid “curse of dimensionality” of nonparametric regression. By using B-splines to fit additive components and constructing linear and quadratic moment conditions which incorporate information in disturbances, the generalized method of moments (GMM) estimators of unknown parameters and additive components are obtained. Under certain regularity assumptions, it is proved that the GMM estimators are consistent and asymptotically normal. Furthermore, the asymptotically efficient best GMM estimators under normality are derived. Monte Carlo simulation and empirical analysis illustrate that the developed estimation method has good finite sample performance and application prospects.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108252"},"PeriodicalIF":1.5,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring the dynamics of quasi-reaction systems via nonlinear local mean-field approximations 用非线性局部平均场近似推断准反应系统的动力学
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-07-22 DOI: 10.1016/j.csda.2025.108251
Matteo Framba , Veronica Vinciotti , Ernst C. Wit
{"title":"Inferring the dynamics of quasi-reaction systems via nonlinear local mean-field approximations","authors":"Matteo Framba ,&nbsp;Veronica Vinciotti ,&nbsp;Ernst C. Wit","doi":"10.1016/j.csda.2025.108251","DOIUrl":"10.1016/j.csda.2025.108251","url":null,"abstract":"<div><div>Parameter estimation of kinetic rates in stochastic quasi-reaction systems can be challenging, particularly when the time gap between consecutive measurements is large. Local linear approximation approaches account for the stochasticity in the system but fail to capture the intrinsically nonlinear nature of the mean dynamics of the process. Moreover, the mean dynamics of a quasi-reaction system can be described by a system of ODEs, which have an explicit solution only for simple unitary systems. An approximate analytical solution is derived for generic quasi-reaction systems via a first-order Taylor approximation of the hazard rate. This allows a nonlinear forward prediction of the future dynamics given the current state of the system. Predictions and corresponding observations are embedded in a nonlinear least-squares approach for parameter estimation. The performance of the algorithm is compared to existing methods via a simulation study. Besides the generality of the approach in the specification of the quasi-reaction system and the gains in computational efficiency, the results show an improvement in the kinetic rate estimation, particularly for data observed at large time intervals. Additionally, the availability of an explicit solution makes the method robust to stiffness, which is often present in biological systems. Application to Rhesus Macaque data illustrates the use of the method in the study of cell differentiation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108251"},"PeriodicalIF":1.5,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample-specific cooperative learning integrating heterogeneous radiomics and pathomics data 样本特异性合作学习整合异质放射组学和病理数据
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-07-21 DOI: 10.1016/j.csda.2025.108250
Shih-Ting Huang , Graham A. Colditz , Shu Jiang
{"title":"Sample-specific cooperative learning integrating heterogeneous radiomics and pathomics data","authors":"Shih-Ting Huang ,&nbsp;Graham A. Colditz ,&nbsp;Shu Jiang","doi":"10.1016/j.csda.2025.108250","DOIUrl":"10.1016/j.csda.2025.108250","url":null,"abstract":"<div><div>Multi-omics analysis offers unparalleled insights into the interlinked molecular interactions that govern the underlying biological processes. In the era of big data, driven by the emergence of high-throughput technologies, it is possible to gain a more comprehensive and detailed understanding of complex systems. Nevertheless, the challenges lie in developing methods to effectively integrate and analyze this wealth of data. This challenge is even more apparent when the type of -omics data (e.g., pathomics) lacks pixel-to-pixel or region-to-region correspondence across the population. A novel sample-specific cooperative learning framework is introduced, designed to adaptively manage diverse multi-omics data types, even when there is no direct correspondence between regions. The proposed framework is defined for both continuous and categorical outcomes, with theoretical guarantees based on finite samples. Model performance is demonstrated and compared with existing methods using real-world datasets involving proteomics and metabolomics, and radiomics and pathomics.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108250"},"PeriodicalIF":1.5,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting interaction tree stumps for modeling interactions 增强交互树桩以建模交互
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-07-16 DOI: 10.1016/j.csda.2025.108247
Michael Lau , Tamara Schikowski , Holger Schwender
{"title":"Boosting interaction tree stumps for modeling interactions","authors":"Michael Lau ,&nbsp;Tamara Schikowski ,&nbsp;Holger Schwender","doi":"10.1016/j.csda.2025.108247","DOIUrl":"10.1016/j.csda.2025.108247","url":null,"abstract":"<div><div>Incorporating interaction effects is essential for accurately modeling complex underlying relationships in many applications. Often, not only strong predictive performance is desired, but also the interpretability of the resulting model. This need is evident in areas such as epidemiology, in which uncovering the interplay of biological mechanisms is critical for understanding complex diseases. Classical linear models, frequently used for constructing genetic risk scores, fail to capture interaction effects autonomously, while modern machine learning methods such as gradient boosting often produce black-box models that lack interpretability. Existing linear interaction models are largely limited to consider two-way interactions. To address these limitations, a novel statistical learning method, BITS (Boosting Interaction Tree Stumps), is introduced to construct linear models while autonomously detecting and incorporating interaction effects. BITS uses gradient boosting on interaction tree stumps, i.e., decision trees with a single split, where in BITS this split can possibly occur on an interaction term. A branch-and-bound approach is employed in BITS to discard weakly predictive terms. For high-dimensional data, a hybrid search strategy combining greedy and exhaustive approaches is proposed. Regularization techniques are integrated to prevent overfitting and the inclusion of spurious interaction effects. Simulation studies and real data applications demonstrate that BITS produces interpretable models with strong predictive performance. Moreover, in the simulation study, BITS primarily identifies truly influential terms.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108247"},"PeriodicalIF":1.5,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144680256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Jeffreys's cardioid distribution 杰弗里斯的心脏分布
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-07-16 DOI: 10.1016/j.csda.2025.108248
Arthur Pewsey
{"title":"On Jeffreys's cardioid distribution","authors":"Arthur Pewsey","doi":"10.1016/j.csda.2025.108248","DOIUrl":"10.1016/j.csda.2025.108248","url":null,"abstract":"<div><div>The cardioid distribution, despite being one of the fundamental models for circular data, has received limited attention both methodologically and in terms of its implementation in R. To redress these shortcomings, published results on the model are summarized, corrected and extended, and the scope and limitations of the existing support for the model in R identified. A thorough investigation into the performance of trigonometric moment and maximum likelihood based approaches to point and interval estimation of the model's location and concentration parameters is presented, and goodness-of-fit techniques outlined. A suite of reliable R functions is provided for the model's practical application. The application of the proposed inferential methods and R functions is illustrated by an analysis of palaeocurrent cross-bed azimuths.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108248"},"PeriodicalIF":1.5,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144656280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling continuous distributions in hybrid Bayesian networks using mixtures of polynomials with tails 用带尾多项式的混合建模混合贝叶斯网络中的连续分布
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-07-11 DOI: 10.1016/j.csda.2025.108246
J.C. Luengo , D. Ramos-López , R. Rumí
{"title":"Modeling continuous distributions in hybrid Bayesian networks using mixtures of polynomials with tails","authors":"J.C. Luengo ,&nbsp;D. Ramos-López ,&nbsp;R. Rumí","doi":"10.1016/j.csda.2025.108246","DOIUrl":"10.1016/j.csda.2025.108246","url":null,"abstract":"<div><div>A new approach to modeling continuous distributions in hybrid Bayesian networks (BNs) is presented. It is based on Mixtures of Polynomials (MoPs) with tails, named as tMoPs. This proposal is a variation of the usual MoP model, now including tails and several other improvements in the learning process. The adequate modeling of tails in variable distributions is relevant theoretically and for many reals applications, in which rare phenomena may have a great impact. The proposed approach has been designed to exploit the flexibility of the tMoP model to fit different continuous data distributions. This is especially relevant in those distributions with zones of density close to zero, in which polynomial fitting may be difficult. In these situations, tMoPs allow a polynomial fit in parts with higher density and the use of tails in areas with lower density. This permits a better global fit, without loss of overall accuracy and yielding a relatively simple density function. Learning algorithms for tMoPs conditional probability distributions with up to two parents of any type are developed. These tMoPs may be integrated into hybrid Bayesian networks to represent conditional probability distributions, thus allowing to perform probabilistic reasoning, such as causal inference, sensitivity analysis, and other decision-making operations. The suitability of tMoPs is evaluated in several ways, using a large set of real datasets with data of different natures. The experiments include: the analysis of goodness-of-fit with several continuous and pseudo-continuous variables, the optimization of certain parameters and the effect of variable selection and graph structure when using tMoPs in BNs, and finally the evaluation of the predictive ability of hybrid BNs based on tMoPs in classification and regression. Results show the good behavior of our proposal, with the tMoP hybrid Bayesian networks being equally accurate or outperforming other techniques in most scenarios, in addition to providing a more informative and convenient probabilistic model.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108246"},"PeriodicalIF":1.5,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel density estimation for compositional data with zeros via hypersphere mapping 基于超球映射的含零成分数据核密度估计
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-07-11 DOI: 10.1016/j.csda.2025.108249
Changwon Yoon , Hyunbin Choi , Jeongyoun Ahn
{"title":"Kernel density estimation for compositional data with zeros via hypersphere mapping","authors":"Changwon Yoon ,&nbsp;Hyunbin Choi ,&nbsp;Jeongyoun Ahn","doi":"10.1016/j.csda.2025.108249","DOIUrl":"10.1016/j.csda.2025.108249","url":null,"abstract":"<div><div>Compositional data—measurements of relative proportions among components—arise frequently in fields ranging from chemometrics to bioinformatics. While density estimation of such data provides crucial insights into their underlying patterns and enables comparative analyses across groups, existing nonparametric approaches are limited, particularly in handling zero components that commonly occur in real-world datasets. We propose a novel kernel density estimation (KDE) method for compositional data that naturally accommodates zero components by exploiting the geometric correspondence between simplices and hyperspheres. This connection to spherical KDE allows us to establish theoretical guarantees, including consistency of the estimator. Through extensive simulations and real data analyses, we demonstrate our method's advantages over existing approaches, particularly in scenarios involving zero components.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108249"},"PeriodicalIF":1.5,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing the equality of high dimensional distributions 测试高维分布的等式
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-07-09 DOI: 10.1016/j.csda.2025.108245
Reza Modarres
{"title":"Testing the equality of high dimensional distributions","authors":"Reza Modarres","doi":"10.1016/j.csda.2025.108245","DOIUrl":"10.1016/j.csda.2025.108245","url":null,"abstract":"<div><div>The Euclidean distance is not a suitable distance for high dimensional settings due to the distance concentration phenomenon. A novel statistic that is inspired by the interpoint distances, but avoids their computation, is proposed for comparing and visualizing high dimensional datasets. The new statistic is based on a high dimensional dissimilarity index that takes advantage of the concentration phenomenon. A simultaneous display of observations means and standard deviations that aids visualization, detection of suspect outliers, and enhances separability among the competing classes in the transformed space is discussed. The finite sample convergence of the dissimilarity indices is studied, nine statistics are compared under several distributions, and three applications are presented.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108245"},"PeriodicalIF":1.5,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信