Statistics and Computing最新文献

筛选
英文 中文
High-dimensional missing data imputation via undirected graphical model 通过无向图模型进行高维缺失数据估算
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-01 DOI: 10.1007/s11222-024-10475-9
Yoonah Lee, Seongoh Park
{"title":"High-dimensional missing data imputation via undirected graphical model","authors":"Yoonah Lee, Seongoh Park","doi":"10.1007/s11222-024-10475-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10475-9","url":null,"abstract":"<p>Multiple imputation is a practical approach in analyzing incomplete data, with multiple imputation by chained equations (MICE) being popularly used. MICE specifies a conditional distribution for each variable to be imputed, but estimating it is inherently a high-dimensional problem for large-scale data. Existing approaches propose to utilize regularized regression models, such as lasso. However, the estimation of them occurs iteratively across all incomplete variables, leading to a considerable increase in computational burden, as demonstrated in our simulation study. To overcome this computational bottleneck, we propose a novel method that estimates the conditional independence structure among variables before the imputation procedure. We extract such information from an undirected graphical model, leveraging the graphical lasso method based on the inverse probability weighting estimator. Our simulation study verifies the proposed method is way faster against the existing methods, while still maintaining comparable imputation performance.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"50 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed subsampling for multiplicative regression 用于乘法回归的分布式子采样
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-01 DOI: 10.1007/s11222-024-10477-7
Xiaoyan Li, Xiaochao Xia, Zhimin Zhang
{"title":"Distributed subsampling for multiplicative regression","authors":"Xiaoyan Li, Xiaochao Xia, Zhimin Zhang","doi":"10.1007/s11222-024-10477-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10477-7","url":null,"abstract":"<p>Multiplicative regression is a useful alternative tool in modeling positive response data. This paper proposes two distributed estimators for multiplicative error model on distributed system with non-randomly distributed massive data. We first present a Poisson subsampling procedure to obtain a subsampling estimator based on the least product relative error (LPRE) loss, which is effective on a distributed system. Theoretically, we justify the subsampling estimator by establishing its convergence rate, asymptotic normality and deriving the optimal subsampling probabilities in terms of the L-optimality criterion. Then, we provide a distributed LPRE estimator based on the Poisson subsampling (DLPRE-P), which is communication-efficient since it needs to transmit a very small subsample from local machines to the central site, which is empirically feasible, together with the gradient of the loss. Practically, due to the use of Newton–Raphson iteration, the Hessian matrix can be computed more robustly using the subsampled data than using one local dataset. We also show that the DLPRE-P estimator is statistically efficient as the global estimator, which is based on putting all the datasets together. Furthermore, we propose a distributed regularized LPRE estimator (DRLPRE-P) to consider the variable selection problem in high dimension. A distributed algorithm based on the alternating direction method of multipliers (ADMM) is developed for implementing the DRLPRE-P. The oracle property holds for DRLPRE-P. Finally, simulation experiments and two real-world data analyses are conducted to illustrate the performance of our methods.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"46 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of spatiotemporal changepoints: a generalised additive model approach 检测时空变化点:广义相加模型方法
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-08-01 DOI: 10.1007/s11222-024-10478-6
Michael J. Hollaway, Rebecca Killick
{"title":"Detection of spatiotemporal changepoints: a generalised additive model approach","authors":"Michael J. Hollaway, Rebecca Killick","doi":"10.1007/s11222-024-10478-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10478-6","url":null,"abstract":"<p>The detection of changepoints in spatio-temporal datasets has been receiving increased focus in recent years and is utilised in a wide range of fields. With temporal data observed at different spatial locations, the current approach is typically to use univariate changepoint methods in a marginal sense with the detected changepoint being representative of a single location only. We present a spatio-temporal changepoint method that utilises a generalised additive model (GAM) dependent on the 2D spatial location and the observation time to account for the underlying spatio-temporal process. We use the full likelihood of the GAM in conjunction with the pruned linear exact time (PELT) changepoint search algorithm to detect multiple changepoints across spatial locations in a computationally efficient manner. When compared to a univariate marginal approach our method is shown to perform more efficiently in simulation studies at detecting true changepoints and demonstrates less evidence of overfitting. Furthermore, as the approach explicitly models spatio-temporal dependencies between spatial locations, any changepoints detected are common across the locations. We demonstrate an application of the method to an air quality dataset covering the COVID-19 lockdown in the United Kingdom.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"187 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Mallows-type model averaging estimator for ridge regression with randomly right censored data 用于随机右删失数据脊回归的马洛式模型平均估算器
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-07-29 DOI: 10.1007/s11222-024-10472-y
Jie Zeng, Guozhi Hu, Weihu Cheng
{"title":"A Mallows-type model averaging estimator for ridge regression with randomly right censored data","authors":"Jie Zeng, Guozhi Hu, Weihu Cheng","doi":"10.1007/s11222-024-10472-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10472-y","url":null,"abstract":"<p>Instead of picking up a single ridge parameter in ridge regression, this paper considers a frequentist model averaging approach to appropriately combine the set of ridge estimators with different ridge parameters, when the response is randomly right censored. Within this context, we propose a weighted least squares ridge estimation for unknown regression parameter. A new Mallows-type weight choice criterion is then developed to allocate model weights, where the unknown distribution function of the censoring random variable is replaced by the Kaplan–Meier estimator and the covariance matrix of random errors is substituted by its averaging estimator. Under some mild conditions, we show that when the fitting model is misspecified, the resulting model averaging estimator achieves optimality in terms of minimizing the loss function. Whereas, when the fitting model is correctly specified, the model averaging estimator of the regression parameter is root-<i>n</i> consistent. Additionally, for the weight vector which is obtained by minimizing the new criterion, we establish its rate of convergence to the infeasible optimal weight vector. Simulation results show that our method is better than some existing methods. A real dataset is analyzed for illustration as well.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"150 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Byzantine-robust and efficient distributed sparsity learning: a surrogate composite quantile regression approach 拜占庭式稳健高效分布式稀疏性学习:一种代用复合量化回归方法
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-07-22 DOI: 10.1007/s11222-024-10470-0
Canyi Chen, Zhengtian Zhu
{"title":"Byzantine-robust and efficient distributed sparsity learning: a surrogate composite quantile regression approach","authors":"Canyi Chen, Zhengtian Zhu","doi":"10.1007/s11222-024-10470-0","DOIUrl":"https://doi.org/10.1007/s11222-024-10470-0","url":null,"abstract":"<p>Distributed statistical learning has gained significant traction recently, mainly due to the availability of unprecedentedly massive datasets. The objective of distributed statistical learning is to learn models by effectively utilizing data scattered across various machines. However, its performance can be impeded by three significant challenges: arbitrary noises, high dimensionality, and machine failures—the latter being specifically referred to as Byzantine failure. To address the first two challenges, we propose leveraging the potential of composite quantile regression in conjunction with the <span>(ell _1)</span> penalty. However, this combination introduces a <i>doubly</i> nonsmooth objective function, posing new challenges. In such scenarios, most existing Byzantine-robust methods exhibit slow sublinear convergence rates and fail to achieve near-optimal statistical convergence rates. To fill this gap, we introduce a novel smoothing procedure that effectively handles the nonsmooth aspects. This innovation allows us to develop a Byzantine-robust sparsity learning algorithm that converges provably to the near-optimal convergence rate <i>linearly</i>. Moreover, we establish support recovery guarantees for our proposed methods. We substantiate the effectiveness of our approaches through comprehensive empirical analyses.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"84 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141741483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ForLion: a new algorithm for D-optimal designs under general parametric statistical models with mixed factors ForLion:混合因子一般参数统计模型下 D-最优设计的新算法
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-07-18 DOI: 10.1007/s11222-024-10465-x
Yifei Huang, Keren Li, Abhyuday Mandal, Jie Yang
{"title":"ForLion: a new algorithm for D-optimal designs under general parametric statistical models with mixed factors","authors":"Yifei Huang, Keren Li, Abhyuday Mandal, Jie Yang","doi":"10.1007/s11222-024-10465-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10465-x","url":null,"abstract":"<p>In this paper, we address the problem of designing an experimental plan with both discrete and continuous factors under fairly general parametric statistical models. We propose a new algorithm, named ForLion, to search for locally optimal approximate designs under the D-criterion. The algorithm performs an exhaustive search in a design space with mixed factors while keeping high efficiency and reducing the number of distinct experimental settings. Its optimality is guaranteed by the general equivalence theorem. We present the relevant theoretical results for multinomial logit models (MLM) and generalized linear models (GLM), and demonstrate the superiority of our algorithm over state-of-the-art design algorithms using real-life experiments under MLM and GLM. Our simulation studies show that the ForLion algorithm could reduce the number of experimental settings by 25% or improve the relative efficiency of the designs by 17.5% on average. Our algorithm can help the experimenters reduce the time cost, the usage of experimental devices, and thus the total cost of their experiments while preserving high efficiencies of the designs.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"9 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141741479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection 用于联合判别聚类和特征选择的互信息的稀疏和几何感知广义化
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-07-17 DOI: 10.1007/s11222-024-10467-9
Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Mickaël Leclercq, Arnaud Droit, Frédéric Precioso
{"title":"Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection","authors":"Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Mickaël Leclercq, Arnaud Droit, Frédéric Precioso","doi":"10.1007/s11222-024-10467-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10467-9","url":null,"abstract":"<p>Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on the data distribution, we introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI with a simple <span>(ell _1)</span> penalty: the Sparse GEMINI. This algorithm avoids the burden of combinatorial feature subset exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a discriminative clustering model. We demonstrate the performances of Sparse GEMINI on synthetic datasets and large-scale datasets. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"50 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal designs for nonlinear mixed-effects models using competitive swarm optimizer with mutated agents 利用具有变异代理的竞争性蜂群优化器优化非线性混合效应模型的设计
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-07-17 DOI: 10.1007/s11222-024-10468-8
Elvis Han Cui, Zizhao Zhang, Weng Kee Wong
{"title":"Optimal designs for nonlinear mixed-effects models using competitive swarm optimizer with mutated agents","authors":"Elvis Han Cui, Zizhao Zhang, Weng Kee Wong","doi":"10.1007/s11222-024-10468-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10468-8","url":null,"abstract":"<p>Nature-inspired meta-heuristic algorithms are increasingly used in many disciplines to tackle challenging optimization problems. Our focus is to apply a newly proposed nature-inspired meta-heuristics algorithm called CSO-MA to solve challenging design problems in biosciences and demonstrate its flexibility to find various types of optimal approximate or exact designs for nonlinear mixed models with one or several interacting factors and with or without random effects. We show that CSO-MA is efficient and can frequently outperform other algorithms either in terms of speed or accuracy. The algorithm, like other meta-heuristic algorithms, is free of technical assumptions and flexible in that it can incorporate cost structure or multiple user-specified constraints, such as, a fixed number of measurements per subject in a longitudinal study. When possible, we confirm some of the CSO-MA generated designs are optimal with theory by developing theory-based innovative plots. Our applications include searching optimal designs to estimate (i) parameters in mixed nonlinear models with correlated random effects, (ii) a function of parameters for a count model in a dose combination study, and (iii) parameters in a HIV dynamic model. In each case, we show the advantages of using a meta-heuristic approach to solve the optimization problem, and the added benefits of the generated designs.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"62 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141741481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A mixture of experts regression model for functional response with functional covariates 带有功能协变量的功能响应专家混合回归模型
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-07-11 DOI: 10.1007/s11222-024-10455-z
Jean Steve Tamo Tchomgui, Julien Jacques, Guillaume Fraysse, Vincent Barriac, Stéphane Chretien
{"title":"A mixture of experts regression model for functional response with functional covariates","authors":"Jean Steve Tamo Tchomgui, Julien Jacques, Guillaume Fraysse, Vincent Barriac, Stéphane Chretien","doi":"10.1007/s11222-024-10455-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10455-z","url":null,"abstract":"<p>Due to the fast growth of data that are measured on a continuous scale, functional data analysis has undergone many developments in recent years. Regression models with a functional response involving functional covariates, also called “function-on-function”, are thus becoming very common. Studying this type of model in the presence of heterogeneous data can be particularly useful in various practical situations. We mainly develop in this work a function-on-function Mixture of Experts (FFMoE) regression model. Like most of the inference approach for models on functional data, we use basis expansion (B-splines) both for covariates and parameters. A regularized inference approach is also proposed, it accurately smoothes functional parameters in order to provide interpretable estimators. Numerical studies on simulated data illustrate the good performance of FFMoE as compared with competitors. Usefullness of the proposed model is illustrated on two data sets: the reference Canadian weather data set, in which the precipitations are modeled according to the temperature, and a Cycling data set, in which the developed power is explained by the speed, the cyclist heart rate and the slope of the road.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"55 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141612355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A limit formula and a series expansion for the bivariate Normal tail probability 双变量正态尾概率的极限公式和数列展开
IF 2.2 2区 数学
Statistics and Computing Pub Date : 2024-07-08 DOI: 10.1007/s11222-024-10466-w
Siu-Kui Au
{"title":"A limit formula and a series expansion for the bivariate Normal tail probability","authors":"Siu-Kui Au","doi":"10.1007/s11222-024-10466-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10466-w","url":null,"abstract":"<p>This work presents a limit formula for the bivariate Normal tail probability. It only requires the larger threshold to grow indefinitely, but otherwise has no restrictions on how the thresholds grow. The correlation parameter can change and possibly depend on the thresholds. The formula is applicable regardless of Salvage’s condition. Asymptotically, it reduces to Ruben’s formula and Hashorva’s formula under the corresponding conditions, and therefore can be considered a generalisation. Under a mild condition, it satisfies Plackett’s identity on the derivative with respect to the correlation parameter. Motivated by the limit formula, a series expansion is also obtained for the exact tail probability using derivatives of the univariate Mill’s ratio. Under similar conditions for the limit formula, the series converges and its truncated approximation has a small remainder term for large thresholds. To take advantage of this, a simple procedure is developed for the general case by remapping the parameters so that they satisfy the conditions. Examples are presented to illustrate the theoretical findings.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"42 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141575837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信