Computational Statistics & Data Analysis最新文献

筛选
英文 中文
Fast autoregressive model for multivariate dependent outcomes with application to lipidomics analysis for Alzheimer’s disease and APOE-ε4 多变量依赖结果的快速自回归模型及其在阿尔茨海默病和APOE-ε4脂质组学分析中的应用
IF 1.6 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-09-29 DOI: 10.1016/j.csda.2025.108280
Hwiyoung Lee , Zhenyao Ye , Chixiang Chen , Peter Kochunov , L. Elliot Hong , Shuo Chen
{"title":"Fast autoregressive model for multivariate dependent outcomes with application to lipidomics analysis for Alzheimer’s disease and APOE-ε4","authors":"Hwiyoung Lee ,&nbsp;Zhenyao Ye ,&nbsp;Chixiang Chen ,&nbsp;Peter Kochunov ,&nbsp;L. Elliot Hong ,&nbsp;Shuo Chen","doi":"10.1016/j.csda.2025.108280","DOIUrl":"10.1016/j.csda.2025.108280","url":null,"abstract":"<div><div>Association analysis of multivariate omics outcomes is challenging due to the high dimensionality and inter-correlation among outcome variables. In practice, the classic multi-univariate analysis approaches are commonly employed, utilizing linear regression models for each individual outcome followed by adjustments for multiplicity through control of the false discovery rate (FDR) or family-wise error rate (FWER). While straightforward, these multi-univariate methods overlook dependencies between outcome variables. This oversight leads to less accurate statistical inferences, characterized by lower power and an increased false discovery rate, ultimately resulting in reduced replicability across studies. Recently, advanced frequentist and Bayesian methods have been developed to account for these dependencies. However, these methods often pose significant computational challenges for researchers in the field. To bridge this gap, a computationally efficient autoregressive multivariate regression model is proposed that explicitly accounts for the dependence structure among outcome variables. Through extensive simulations, it is demonstrated that the approach provides more accurate multivariate inferences than traditional methods and remains robust even under model misspecification. Additionally, the proposed method is applied to investigate whether the associations between serum lipidomics outcomes and Alzheimer’s disease differentiate in <span><math><mrow><mrow><mi>ε</mi></mrow><mn>4</mn></mrow></math></span> allele carriers and non-carriers of the apolipoprotein E (APOE) gene.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108280"},"PeriodicalIF":1.6,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bootstrap-based goodness-of-fit test for parametric families of conditional distributions 条件分布参数族的自举拟合优度检验
IF 1.6 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-09-27 DOI: 10.1016/j.csda.2025.108289
Gitte Kremling, Gerhard Dikta
{"title":"Bootstrap-based goodness-of-fit test for parametric families of conditional distributions","authors":"Gitte Kremling,&nbsp;Gerhard Dikta","doi":"10.1016/j.csda.2025.108289","DOIUrl":"10.1016/j.csda.2025.108289","url":null,"abstract":"<div><div>A consistent goodness-of-fit test for distributional regression is introduced. The test statistic is based on a process that traces the difference between a nonparametric and a semi-parametric estimate of the marginal distribution function of <span><math><mi>Y</mi></math></span>. As its asymptotic null distribution is not distribution-free, a parametric bootstrap method is used to determine critical values. Empirical results suggest that, in certain scenarios, the test outperforms existing specification tests by achieving a higher power and thereby offering greater sensitivity to deviations from the assumed parametric distribution family. Notably, the proposed test does not involve any hyperparameters and can easily be applied to individual datasets using the gofreg-package in R.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108289"},"PeriodicalIF":1.6,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resampling NANCOVA: Nonparametric analysis of covariance in small samples 重采样NANCOVA:小样本协方差的非参数分析
IF 1.6 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-09-26 DOI: 10.1016/j.csda.2025.108290
Konstantin Emil Thiel , Paavo Sattler , Arne C. Bathke , Georg Zimmermann
{"title":"Resampling NANCOVA: Nonparametric analysis of covariance in small samples","authors":"Konstantin Emil Thiel ,&nbsp;Paavo Sattler ,&nbsp;Arne C. Bathke ,&nbsp;Georg Zimmermann","doi":"10.1016/j.csda.2025.108290","DOIUrl":"10.1016/j.csda.2025.108290","url":null,"abstract":"<div><div>Analysis of covariance is a crucial method for improving precision of statistical tests for factor effects in randomized experiments. However, existing solutions suffer from one or more of the following limitations: (i) they are not suitable for ordinal data (as endpoints or explanatory variables); (ii) they require semiparametric model assumptions; (iii) they are inapplicable to small data scenarios due to often poor type-I error control; or (iv) they provide only approximate testing procedures and (asymptotically) exact test are missing. A resampling approach to the NANCOVA framework is investigated. NANCOVA is a fully nonparametric model based on <em>relative effects</em> that allows for an arbitrary number of covariates and groups, where both outcome variable (endpoint) and covariates can be metric or ordinal. Novel NANCOVA tests and a nonparametric competitor test without covariate adjustment were evaluated in extensive simulations. Unlike approximate tests in the NANCOVA framework, the proposed resampling version showed good performance in small sample scenarios and maintained the nominal type-I error well. Resampling NANCOVA also provided consistently high power: up to 26 % higher than the test without covariate adjustment in a small sample scenario with 4 groups and two covariates. Moreover, it is shown that resampling NANCOVA provides an asymptotically exact testing procedure, which makes it the first one with good finite sample performance in the present NANCOVA framework. In summary, resampling NANCOVA can be considered a viable tool for analysis of covariance overcoming issues (i) - (iv).</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108290"},"PeriodicalIF":1.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Change-point detection in regression models via the max-EM algorithm 基于max-EM算法的回归模型变点检测
IF 1.6 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-09-24 DOI: 10.1016/j.csda.2025.108278
Modibo Diabaté , Grégory Nuel , Olivier Bouaziz
{"title":"Change-point detection in regression models via the max-EM algorithm","authors":"Modibo Diabaté ,&nbsp;Grégory Nuel ,&nbsp;Olivier Bouaziz","doi":"10.1016/j.csda.2025.108278","DOIUrl":"10.1016/j.csda.2025.108278","url":null,"abstract":"<div><div>The problem of breakpoint detection is considered within a regression modeling framework. A novel method, the max-EM algorithm, is introduced, combining a constrained Hidden Markov Model with the Classification-EM algorithm. This algorithm has linear complexity and provides accurate detection of breakpoints and estimation of parameters. A theoretical result is derived, showing that the likelihood of the data, as a function of the regression parameters and the breakpoints location, increases at each step of the algorithm. Two initialization methods for the breakpoints location are also presented to address local maxima issues. Finally, a statistical test in the one breakpoint situation is developed. Simulation experiments based on linear, logistic, Poisson and Accelerated Failure Time regression models show that the final method that includes the initialization procedure and the max-EM algorithm has a strong performance both in terms of parameters estimation and breakpoints detection. The statistical test is also evaluated and exhibits a correct rejection rate under the null hypothesis and a strong power under various alternatives. Two real dataset are analyzed, the UCI bike sharing and the health disease data, where the interest of the method to detect heterogeneity in the distribution of the data is illustrated.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108278"},"PeriodicalIF":1.6,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and efficient causal inference in large-scale data via subsampling and projection calibration 基于子采样和投影校准的大规模数据快速有效的因果推理
IF 1.6 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-09-22 DOI: 10.1016/j.csda.2025.108281
Miaomiao Su
{"title":"Fast and efficient causal inference in large-scale data via subsampling and projection calibration","authors":"Miaomiao Su","doi":"10.1016/j.csda.2025.108281","DOIUrl":"10.1016/j.csda.2025.108281","url":null,"abstract":"<div><div>Estimating the average treatment effect in large-scale datasets faces significant computational and storage challenges. Subsampling has emerged as a critical strategy to mitigate these issues. This paper proposes a novel subsampling method that builds on the G-estimation method offering the double robustness property. The proposed method uses a small subset of data to estimate computationally complex nuisance parameters, while leveraging the full dataset for the computationally simple final estimation. To ensure that the resulting estimator remains first-order insensitive to variations in nuisance parameters, a projection approach is introduced to optimize the estimation of the outcome regression function and treatment regression function such that the Neyman orthogonality conditions are satisfied. It is shown that the resulting estimator is asymptotically normal and achieves the same convergence rate as the full data-based estimator when either the treatment or the outcome models is correctly specified. Additionally, when both models are correctly specified, the proposed estimator achieves the same asymptotic variance as the full data-based estimator. The finite sample performance of the proposed method is demonstrated through simulation studies and an application to birth data, comprising over 30 million observations collected over the past eight years. Numerical results indicate that the proposed estimator is nearly as computationally efficient as the uniform subsampling estimator, while achieving similar estimation efficiency to the full data-based G-estimator.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"214 ","pages":"Article 108281"},"PeriodicalIF":1.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gamma approximation of stratified truncated exact test (GASTE-test) & application 分层截断精确检验(gaste检验)的伽玛近似及其应用
IF 1.6 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-09-21 DOI: 10.1016/j.csda.2025.108277
Alexandre Wendling, Clovis Galiez
{"title":"Gamma approximation of stratified truncated exact test (GASTE-test) & application","authors":"Alexandre Wendling,&nbsp;Clovis Galiez","doi":"10.1016/j.csda.2025.108277","DOIUrl":"10.1016/j.csda.2025.108277","url":null,"abstract":"<div><div>The analysis of binary outcomes and features, such as the effect of vaccination on health, often rely on 2 <span><math><mo>×</mo></math></span> 2 contingency tables. However, confounding factors such as age or gender call for stratified analysis, by creating sub-tables, which is common in bioscience, epidemiological, and social research, as well as in meta-analyses. Traditional methods for testing associations across strata, such as the Cochran-Mantel-Haenszel (CMH) test, struggle with small sample sizes and heterogeneity of effects between strata. Exact tests can address these issues, but are computationally expensive. To address these challenges, the Gamma Approximation of Stratified Truncated Exact (GASTE) test is proposed. It approximates the exact statistic of the combination of p-values with discrete support, leveraging the gamma distribution to approximate the distribution of the test statistic under stratification, providing fast and accurate p-value calculations, even when effects vary between strata. The GASTE method maintains high statistical power and low type I error rates, outperforming traditional methods by offering more sensitive and reliable detection. It is computationally efficient and broadens the applicability of exact tests in research fields with stratified binary data. The GASTE method is demonstrated through two applications: an ecological study of Alpine plant associations and a 1973 case study on admissions at the University of California, Berkeley. The GASTE method offers substantial improvements over traditional approaches. The GASTE method is available as an open-source package at <span><span>https://github.com/AlexandreWen/gaste</span><svg><path></path></svg></span>. A Python package is available on PyPI at <span><span>https://pypi.org/project/gaste-test/</span><svg><path></path></svg></span></div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"214 ","pages":"Article 108277"},"PeriodicalIF":1.6,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized composite multi-sample tests for high-dimensional data 高维数据的广义复合多样本检验
IF 1.6 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-09-21 DOI: 10.1016/j.csda.2025.108279
Xiaoli Kong , Alejandro Villasante-Tezanos , David W. Fardo , Solomon W. Harrar
{"title":"Generalized composite multi-sample tests for high-dimensional data","authors":"Xiaoli Kong ,&nbsp;Alejandro Villasante-Tezanos ,&nbsp;David W. Fardo ,&nbsp;Solomon W. Harrar","doi":"10.1016/j.csda.2025.108279","DOIUrl":"10.1016/j.csda.2025.108279","url":null,"abstract":"<div><div>High-dimensional data is ubiquitous in studies involving omics, human movement, and imaging. A multivariate comparison method is proposed for such types of data when either the dimension or the replication size substantially exceeds the other. A testing procedure is introduced that centers and scales a composite measure of distance statistic among the samples to appropriately account for high dimensions and/or large sample sizes. The properties of the test statistic are examined both theoretically and empirically. The proposed procedure demonstrates superior performance in simulation studies and an application to confirm the involvement of previously identified genes in the stages of invasive breast cancer.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"214 ","pages":"Article 108279"},"PeriodicalIF":1.6,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recursive nonparametric predictive for a discrete regression model 离散回归模型的递归非参数预测
IF 1.6 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-09-16 DOI: 10.1016/j.csda.2025.108275
Lorenzo Cappello , Stephen G. Walker
{"title":"Recursive nonparametric predictive for a discrete regression model","authors":"Lorenzo Cappello ,&nbsp;Stephen G. Walker","doi":"10.1016/j.csda.2025.108275","DOIUrl":"10.1016/j.csda.2025.108275","url":null,"abstract":"<div><div>A recursive algorithm is proposed to estimate a set of distribution functions indexed by a regressor variable. The procedure is fully nonparametric and has a Bayesian motivation and interpretation. Indeed, the recursive algorithm follows a certain Bayesian update, defined by the predictive distribution of a Dirichlet process mixture of linear regression models. Consistency of the algorithm is demonstrated under mild assumptions, and numerical accuracy in finite samples is shown via simulations and real data examples. The algorithm is very fast to implement, it is parallelizable, sequential, and requires limited computing power.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108275"},"PeriodicalIF":1.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145227724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An algorithm for estimating threshold boundary regression models 估计阈值边界回归模型的算法
IF 1.6 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-09-15 DOI: 10.1016/j.csda.2025.108274
Chih-Hao Chang , Takeshi Emura , Shih-Feng Huang
{"title":"An algorithm for estimating threshold boundary regression models","authors":"Chih-Hao Chang ,&nbsp;Takeshi Emura ,&nbsp;Shih-Feng Huang","doi":"10.1016/j.csda.2025.108274","DOIUrl":"10.1016/j.csda.2025.108274","url":null,"abstract":"<div><div>This paper presents an innovative iterative two-stage algorithm designed for estimating threshold boundary regression (TBR) models. By transforming the non-differentiable least-squares (LS) problem inherent in fitting TBR models into an optimization framework, our algorithm combines the optimization of a weighted classification error function for the threshold model with obtaining LS estimators for regression models. To improve the efficiency and flexibility of TBR model estimation, we integrate the weighted support vector machine (WSVM) as a surrogate method for solving the weighted classification problem. The TBR-WSVM algorithm offers several key advantages over recently developed methods: it eliminates pre-specification requirements for threshold parameters, accommodates flexible estimation of nonlinear threshold boundaries, and streamlines the estimation process. We conducted several simulation studies to illustrate the finite-sample performance of TBR-WSVM. Finally, we demonstrate the practical applicability of the TBR model through a real data analysis.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"214 ","pages":"Article 108274"},"PeriodicalIF":1.6,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145099734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rate accelerated inference for integrals of multivariate random functions 多元随机函数积分的速率加速推理
IF 1.6 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-09-11 DOI: 10.1016/j.csda.2025.108273
Valentin Patilea, Sunny G․ W․ Wang
{"title":"Rate accelerated inference for integrals of multivariate random functions","authors":"Valentin Patilea,&nbsp;Sunny G․ W․ Wang","doi":"10.1016/j.csda.2025.108273","DOIUrl":"10.1016/j.csda.2025.108273","url":null,"abstract":"<div><div>The computation of integrals is a fundamental task in the analysis of functional data, where the data are typically considered as random elements in a space of squared integrable functions. Effective unbiased estimation and inference procedures are proposed for integrals of uni- and multivariate random functions. Applications to key problems in functional data analysis involving random design points are examined and illustrated. In the absence of noise, the proposed estimates converge faster than the sample mean and standard numerical integration algorithms. The estimator also supports effective inference by generally providing better coverage with shorter confidence and prediction intervals in both noisy and noiseless settings.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"214 ","pages":"Article 108273"},"PeriodicalIF":1.6,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145099732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信