Computational Statistics & Data Analysis最新文献

筛选
英文 中文
A multiple imputation approach for flexible modelling of interval-censored data with missing and censored covariates 一种具有缺失协变量和被删协变量的区间截尾数据灵活建模的多重插值方法
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-25 DOI: 10.1016/j.csda.2025.108177
Yichen Lou , Yuqing Ma , Liming Xiang , Jianguo Sun
{"title":"A multiple imputation approach for flexible modelling of interval-censored data with missing and censored covariates","authors":"Yichen Lou ,&nbsp;Yuqing Ma ,&nbsp;Liming Xiang ,&nbsp;Jianguo Sun","doi":"10.1016/j.csda.2025.108177","DOIUrl":"10.1016/j.csda.2025.108177","url":null,"abstract":"<div><div>This paper discusses regression analysis of interval-censored failure time data that commonly occur in biomedical studies among others. For the situation, the failure event of interest is known only to occur within an interval instead of being observed exactly. In addition to interval censoring on the failure time of interest, sometimes covariates may be missing or suffer censoring, which can bring extra theoretical and computational challenges for the regression analysis. To deal with such data, we propose a novel multiple imputation approach with the use of the rejection sampling under a class of semiparametric transformation models. The proposed method is flexible and can lead to more efficient estimation than the existing methods, and the resulting estimators are shown to be consistent and asymptotically normal. An extensive simulation study is conducted and demonstrates that the proposed approach works well in practice. Finally, we apply the proposed approach to a set of real data on Alzheimer's disease that motivated this study.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108177"},"PeriodicalIF":1.5,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143714600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse factor analysis for categorical data with the group-sparse generalized singular value decomposition 基于群稀疏广义奇异值分解的分类数据稀疏因子分析
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-25 DOI: 10.1016/j.csda.2025.108179
Ju-Chi Yu , Julie Le Borgne , Anjali Krishnan , Arnaud Gloaguen , Cheng-Ta Yang , Laura A. Rabin , Hervé Abdi , Vincent Guillemot
{"title":"Sparse factor analysis for categorical data with the group-sparse generalized singular value decomposition","authors":"Ju-Chi Yu ,&nbsp;Julie Le Borgne ,&nbsp;Anjali Krishnan ,&nbsp;Arnaud Gloaguen ,&nbsp;Cheng-Ta Yang ,&nbsp;Laura A. Rabin ,&nbsp;Hervé Abdi ,&nbsp;Vincent Guillemot","doi":"10.1016/j.csda.2025.108179","DOIUrl":"10.1016/j.csda.2025.108179","url":null,"abstract":"<div><div>Correspondence analysis, multiple correspondence analysis, and their discriminant counterparts (i.e., discriminant simple correspondence analysis and discriminant multiple correspondence analysis) are methods of choice for analyzing multivariate categorical data. In these methods, variables are integrated into optimal components computed as linear combinations whose weights are obtained from a generalized singular value decomposition (GSVD) that integrates specific metric constraints on the rows and columns of the original data matrix. The weights of the linear combinations are, in turn, used to interpret the components, and this interpretation is facilitated when components are 1) pairwise orthogonal and 2) when the values of the weights are either large or small but not intermediate—a configuration called a simple or a sparse structure. To obtain such simple configurations, the optimization problem solved by the GSVD is extended to include new constraints that implement component orthogonality and sparse weights. Because multiple correspondence analysis represents qualitative variables by a set of binary columns in the data matrix, an additional group constraint is added to the optimization problem in order to sparsify the whole set of columns representing one qualitative variable. This method—called group-sparse GSVD (gsGSVD)—integrates these constraints in a new algorithm via an iterative projection scheme onto the intersection of subspaces where each subspace implements a specific constraint. This algorithm is described in details, and we show how it can be adapted to the sparsification of simple and multiple correspondence analysis (as well as their barycentric discriminant analysis versions). This algorithm is illustrated with the analysis of four different data sets—each illustrating the sparsification of a particular CA-based method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108179"},"PeriodicalIF":1.5,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selecting time-series hyperparameters with the artificial jackknife 用人工折刀选择时间序列超参数
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-18 DOI: 10.1016/j.csda.2025.108173
Filippo Pellegrino
{"title":"Selecting time-series hyperparameters with the artificial jackknife","authors":"Filippo Pellegrino","doi":"10.1016/j.csda.2025.108173","DOIUrl":"10.1016/j.csda.2025.108173","url":null,"abstract":"<div><div>A generalisation of the delete-<em>d</em> jackknife is proposed for solving hyperparameter selection problems in time series. The method is referred to as the artificial delete-<em>d</em> jackknife, emphasizing that it replaces the classic removal step with a fictitious deletion, wherein observed data points are replaced with artificial missing values. This procedure preserves the data order, ensuring seamless compatibility with time series. The approach is asymptotically justified and its finite-sample properties are studied via simulations. In addition, an application based on foreign exchange rates illustrates its practical relevance.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108173"},"PeriodicalIF":1.5,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hidden semi-Markov models with inhomogeneous state dwell-time distributions 具有非齐次状态驻留时间分布的隐半马尔可夫模型
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-14 DOI: 10.1016/j.csda.2025.108171
Jan-Ole Koslik
{"title":"Hidden semi-Markov models with inhomogeneous state dwell-time distributions","authors":"Jan-Ole Koslik","doi":"10.1016/j.csda.2025.108171","DOIUrl":"10.1016/j.csda.2025.108171","url":null,"abstract":"<div><div>The well-established methodology for the estimation of hidden semi-Markov models (HSMMs) as hidden Markov models (HMMs) with extended state spaces is further developed. Covariate influences are incorporated across all aspects of the state process model, in particular regarding the distributions governing the state dwell time. The special case of periodically varying covariate effects on the state dwell-time distributions — and possibly the conditional transition probabilities — is examined in detail. Important properties of these models are derived, including the periodically varying unconditional state distribution as well as the overall state dwell-time distribution. Simulation studies are conducted to assess key properties of these models and provide recommendations for hyperparameter settings. A case study involving an HSMM with periodically varying dwell-time distributions is presented to analyse the movement trajectory of an Arctic muskox, demonstrating the practical relevance of the developed methodology.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108171"},"PeriodicalIF":1.5,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based edge clustering for weighted networks with a noise component 基于模型的带噪声加权网络边缘聚类
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-14 DOI: 10.1016/j.csda.2025.108172
Haomin Li, Daniel K. Sewell
{"title":"Model-based edge clustering for weighted networks with a noise component","authors":"Haomin Li,&nbsp;Daniel K. Sewell","doi":"10.1016/j.csda.2025.108172","DOIUrl":"10.1016/j.csda.2025.108172","url":null,"abstract":"<div><div>Clustering is a fundamental task in network analysis, essential for uncovering hidden structures within complex systems. Edge clustering, which focuses on relationships between nodes rather than the nodes themselves, has gained increased attention in recent years. However, existing edge clustering algorithms often overlook the significance of edge weights, which can represent the strength or capacity of connections, and fail to account for noisy edges—connections that obscure the true structure of the network. To address these challenges, the Weighted Edge Clustering Adjusting for Noise (WECAN) model is introduced. This novel algorithm integrates edge weights into the clustering process and includes a noise component that filters out spurious edges. WECAN offers a data-driven approach to distinguishing between meaningful and noisy edges, avoiding the arbitrary thresholding commonly used in network analysis. Its effectiveness is demonstrated through simulation studies and applications to real-world datasets, showing significant improvements over traditional clustering methods. Additionally, the R package “WECAN”<span><span><sup>1</sup></span></span> has been developed to facilitate its practical implementation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108172"},"PeriodicalIF":1.5,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Functional nonlinear principal component analysis 泛函非线性主成分分析
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-13 DOI: 10.1016/j.csda.2025.108169
Qingzhi Zhong , Xinyuan Song
{"title":"Functional nonlinear principal component analysis","authors":"Qingzhi Zhong ,&nbsp;Xinyuan Song","doi":"10.1016/j.csda.2025.108169","DOIUrl":"10.1016/j.csda.2025.108169","url":null,"abstract":"<div><div>The widely adopted dimension reduction technique, functional principal component analysis (FPCA), typically represents functional data as a linear combination of functional principal components (FPCs) and their corresponding scores. However, this linear formulation is too restrictive to reflect reality because it fails to capture the nonlinear dependence of functional data when nonlinear features are present in the data. This study develops a novel FPCA model to uncover the nonlinear structures of functional data. The proposed method can accommodate multivariate functional data observed on different domains, and multidimensional functional data with gaps and holes. To navigate the complexities of spatial structure in multidimensional functional variables, tensor product smoothing and spline smoothing over triangulation are employed, providing precise tools for approximating nonparametric function. Furthermore, an efficient estimation approach and theory are developed when the number of FPCs diverges to infinity. To assess its performance comprehensively, extensive simulations are conducted, and the proposed method is applied to real data from the Alzheimer's Disease Neuroimaging Initiative study, affirming its practical efficacy in uncovering and interpreting nonlinear structures inherent in functional data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108169"},"PeriodicalIF":1.5,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Manifold-valued models for analysis of EEG time series data 脑电时序数据分析的流形值模型
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-07 DOI: 10.1016/j.csda.2025.108168
Tao Ding , Tom M.W. Nye , Yujiang Wang
{"title":"Manifold-valued models for analysis of EEG time series data","authors":"Tao Ding ,&nbsp;Tom M.W. Nye ,&nbsp;Yujiang Wang","doi":"10.1016/j.csda.2025.108168","DOIUrl":"10.1016/j.csda.2025.108168","url":null,"abstract":"<div><div>EEG (electroencephalogram) records brain electrical activity and is a vital clinical tool in the diagnosis and treatment of epilepsy. Time series of covariance matrices between EEG channels for patients suffering from epilepsy, obtained from an open-source dataset, are analysed. The aim is two-fold: to develop a model with interpretable parameters for different possible modes of EEG dynamics, and to explore the extent to which modelling results are affected by the choice of geometry imposed on the space of covariance matrices. The space of full-rank covariance matrices of fixed dimension forms a smooth manifold, and any statistical analysis inherently depends on the choice of metric or Riemannian structure on this manifold. The model specifies a distribution for the tangent direction vector at any time point, combining an autoregressive term, a mean reverting term and a form of Gaussian noise. Parameter inference is performed by maximum likelihood estimation, and we compare modelling results obtained using the standard Euclidean geometry and the affine invariant geometry on covariance matrices. The findings reveal distinct dynamics between epileptic seizures and interictal periods (between seizures), with interictal series characterized by strong mean reversion and absence of autoregression, while seizures exhibit significant autoregressive components with weaker mean reversion. The fitted models are also used to measure seizure dissimilarity within and between patients.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108168"},"PeriodicalIF":1.5,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regression analysis of elliptically symmetric directional data 椭圆对称方向数据的回归分析
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-03 DOI: 10.1016/j.csda.2025.108167
Zehao Yu, Xianzheng Huang
{"title":"Regression analysis of elliptically symmetric directional data","authors":"Zehao Yu,&nbsp;Xianzheng Huang","doi":"10.1016/j.csda.2025.108167","DOIUrl":"10.1016/j.csda.2025.108167","url":null,"abstract":"<div><div>A comprehensive toolkit is developed for regression analysis of directional data based on a flexible class of angular Gaussian distributions. Informative testing procedures to assess rotational symmetry around the mean direction, and the dependence of model parameters on covariates are proposed. Bootstrap-based algorithms are provided to assess the significance of the proposed test statistics. Moreover, a prediction region that achieves the smallest volume in a class of ellipsoidal prediction regions of the same coverage probability is constructed. The efficacy of these inference procedures is demonstrated in simulation experiments. Finally, this new toolkit is used to analyze directional data originating from a hydrology study and a bioinformatics application.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108167"},"PeriodicalIF":1.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An accurate computational approach for partial likelihood using Poisson-binomial distributions 用泊松二项分布计算部分似然的精确方法
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-02-24 DOI: 10.1016/j.csda.2025.108161
Youngjin Cho, Yili Hong, Pang Du
{"title":"An accurate computational approach for partial likelihood using Poisson-binomial distributions","authors":"Youngjin Cho,&nbsp;Yili Hong,&nbsp;Pang Du","doi":"10.1016/j.csda.2025.108161","DOIUrl":"10.1016/j.csda.2025.108161","url":null,"abstract":"<div><div>In a Cox model, the partial likelihood, as the product of a series of conditional probabilities, is used to estimate the regression coefficients. In practice, those conditional probabilities are approximated by risk score ratios based on a continuous time model, and thus result in parameter estimates from only an approximate partial likelihood. Through a revisit to the original partial likelihood idea, an accurate partial likelihood computing method for the Cox model is proposed, which calculates the exact conditional probability using the Poisson-binomial distribution. New estimating and inference procedures are developed, and theoretical results are established for the proposed computational procedure. Although ties are common in real studies, current theories for the Cox model mostly do not consider cases for tied data. In contrast, the new approach includes the theory for grouped data, which allows ties, and also includes the theory for continuous data without ties, providing a unified framework for computing partial likelihood for data with or without ties. Numerical results show that the proposed method outperforms current methods in reducing bias and mean squared error, while achieving improved confidence interval coverage rates, especially when there are many ties or when the variability in risk scores is large. Comparisons between methods in real applications have been made.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108161"},"PeriodicalIF":1.5,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143519899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Communication-efficient estimation and inference for high-dimensional longitudinal data 高维纵向数据的高效通信估计与推断
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-02-24 DOI: 10.1016/j.csda.2025.108154
Xing Li, Yanjing Peng, Lei Wang
{"title":"Communication-efficient estimation and inference for high-dimensional longitudinal data","authors":"Xing Li,&nbsp;Yanjing Peng,&nbsp;Lei Wang","doi":"10.1016/j.csda.2025.108154","DOIUrl":"10.1016/j.csda.2025.108154","url":null,"abstract":"<div><div>With the rapid growth in modern science and technology, distributed longitudinal data have drawn attention in a wide range of aspects. Realizing that not all effects of covariates are our parameters of interest, we focus on the distributed estimation and statistical inference of a pre-conceived low-dimensional parameter in the high-dimensional longitudinal GLMs with canonical links. To mitigate the impact of high-dimensional nuisance parameters and incorporate the within-subject correlation simultaneously, a decorrelated quadratic inference function is proposed for enhancing the estimation efficiency. Two communication-efficient surrogate decorrelated score estimators based on multi-round iterative algorithms are proposed. The error bounds and limiting distribution of the proposed estimators are established and extensive numerical experiments demonstrate the effectiveness of our method. An application to the National Longitudinal Survey of Youth Dataset is also presented.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108154"},"PeriodicalIF":1.5,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信