Computational Statistics & Data Analysis最新文献

筛选
英文 中文
Multiply robust estimation of causal effects using linked data
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-04-02 DOI: 10.1016/j.csda.2025.108175
Shanshan Luo , Yechi Zhang , Wei Li , Zhi Geng
{"title":"Multiply robust estimation of causal effects using linked data","authors":"Shanshan Luo ,&nbsp;Yechi Zhang ,&nbsp;Wei Li ,&nbsp;Zhi Geng","doi":"10.1016/j.csda.2025.108175","DOIUrl":"10.1016/j.csda.2025.108175","url":null,"abstract":"<div><div>Unmeasured confounding presents a common challenge in observational studies, potentially making standard causal parameters unidentifiable without additional assumptions. Given the increasing availability of diverse data sources, exploiting data linkage offers a potential solution to mitigate unmeasured confounding within a primary study of interest. However, this approach often introduces selection bias, as data linkage is feasible only for a subset of the study population. To address such a concern, this paper explores three nonparametric identification strategies assuming that a unit's inclusion in the linked cohort is determined solely by the observed confounders, while acknowledging that the ignorability assumption may depend on some partially unobserved covariates. The existence of multiple identification strategies motivates the development of estimators that effectively capture distinct components of the observed data distribution. Appropriately combining these estimators yields triply robust estimators for the average treatment effect. These estimators remain consistent if at least one of the three distinct parts of the observed data law is correct. Moreover, they are locally efficient if all the models are correctly specified. The proposed estimators are evaluated using simulation studies and real data analysis.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108175"},"PeriodicalIF":1.5,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Eliciting prior information from clinical trials via calibrated Bayes factor
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-31 DOI: 10.1016/j.csda.2025.108180
Roberto Macrì Demartino , Leonardo Egidi , Nicola Torelli , Ioannis Ntzoufras
{"title":"Eliciting prior information from clinical trials via calibrated Bayes factor","authors":"Roberto Macrì Demartino ,&nbsp;Leonardo Egidi ,&nbsp;Nicola Torelli ,&nbsp;Ioannis Ntzoufras","doi":"10.1016/j.csda.2025.108180","DOIUrl":"10.1016/j.csda.2025.108180","url":null,"abstract":"<div><div>In the Bayesian framework power prior distributions are increasingly adopted in clinical trials and similar studies to incorporate external and past information, typically to inform the parameter associated with a treatment effect. Their use is particularly effective in scenarios with small sample sizes and where robust prior information is available. A crucial component of this methodology is represented by its weight parameter, which controls the volume of historical information incorporated into the current analysis. Although this parameter can be modeled as either fixed or random, eliciting its prior distribution via a full Bayesian approach remains challenging. In general, this parameter should be carefully selected to accurately reflect the available historical information without dominating the posterior inferential conclusions. A novel simulation-based calibrated Bayes factor procedure is proposed to elicit the prior distribution of the weight parameter, allowing it to be updated according to the strength of the evidence in the data. The goal is to facilitate the integration of historical data when there is agreement with current information and to limit it when discrepancies arise in terms, for instance, of prior-data conflicts. The performance of the proposed method is tested through simulation studies and applied to real data from clinical trials.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108180"},"PeriodicalIF":1.5,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143746613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discretization: Privacy-preserving data publishing for causal discovery
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-27 DOI: 10.1016/j.csda.2025.108174
Youngmin Ahn , Woongjoon Park , Gunwoong Park
{"title":"Discretization: Privacy-preserving data publishing for causal discovery","authors":"Youngmin Ahn ,&nbsp;Woongjoon Park ,&nbsp;Gunwoong Park","doi":"10.1016/j.csda.2025.108174","DOIUrl":"10.1016/j.csda.2025.108174","url":null,"abstract":"<div><div>As the importance of data privacy continues to grow, data masking has emerged as a crucial method. Notably, data masking techniques aim to protect individual privacy, while enabling data analysts to derive meaningful statistical results, such as the identification of directional or causal relationships between variables. Hence, this study demonstrates the advantages of a quantile-based discretization for protecting privacy and uncovering the relationships between variables in Gaussian directed acyclic graphical (DAG) models. Specifically, it introduces quantile-discretized Gaussian DAG models where each node variable is discretized based on the quantiles. Additionally, it proposes the bi-partition process, which aids in recovering the covariance matrix; hence, the models can be identifiable. Furthermore, a consistent algorithm is developed for learning the underlying structure using the quantile-based discretized data. Finally, through numerical experiments and the application of DAG learning algorithms to discretized MLB data, the proposed algorithm is demonstrated to significantly outperform the state-of-the-art DAG model learning algorithms.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108174"},"PeriodicalIF":1.5,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143738596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient regularized estimation of graphical proportional hazards model with interval-censored data
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-27 DOI: 10.1016/j.csda.2025.108178
Huimin Lu , Yilong Wang , Heming Bing , Shuying Wang , Niya Li
{"title":"Efficient regularized estimation of graphical proportional hazards model with interval-censored data","authors":"Huimin Lu ,&nbsp;Yilong Wang ,&nbsp;Heming Bing ,&nbsp;Shuying Wang ,&nbsp;Niya Li","doi":"10.1016/j.csda.2025.108178","DOIUrl":"10.1016/j.csda.2025.108178","url":null,"abstract":"<div><div>Variable selection is discussed in many cases in survival analysis. In particular, the analysis of using proportional hazards (PH) models to deal with censored survival data has established a large amount of literature. Based on interval-censored data, this paper discusses the situation of complex network structures existing in covariates. To address the issue, a more flexible and versatile PH model has been developed by combining probabilistic graphical models with PH models, to describe the correlation between covariates. Based on the block coordinate descent method, a penalized estimation method is proposed, which can simultaneously perform variable selection and parameter estimation. The effectiveness of the proposed model and its parameter estimation method are evaluated through simulation studies and the analysis of clinical trial data related to Alzheimer's disease, confirming the reliability and accuracy of the proposed model and method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108178"},"PeriodicalIF":1.5,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143738597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linear covariance selection model via ℓ1-penalization
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-27 DOI: 10.1016/j.csda.2025.108176
Kwan-Young Bak , Seongoh Park
{"title":"Linear covariance selection model via ℓ1-penalization","authors":"Kwan-Young Bak ,&nbsp;Seongoh Park","doi":"10.1016/j.csda.2025.108176","DOIUrl":"10.1016/j.csda.2025.108176","url":null,"abstract":"<div><div>This paper presents a study on an <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-penalized covariance regression method. Conventional approaches in high-dimensional covariance estimation often lack the flexibility to integrate external information. As a remedy, we adopt the regression-based covariance modeling framework and introduce a linear covariance selection model (LCSM) to encompass a broader spectrum of covariance structures when covariate information is available. Unlike existing methods, we do not assume that the true covariance matrix can be exactly represented by a linear combination of known basis matrices. Instead, we adopt additional basis matrices for a portion of the covariance patterns not captured by the given bases. To estimate high-dimensional regression coefficients, we exploit the sparsity-inducing <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-penalization scheme. Our theoretical analyses are based on the (symmetric) matrix regression model with additive random error matrix, which allows us to establish new non-asymptotic convergence rates of the proposed covariance estimator. The proposed method is implemented with the coordinate descent algorithm. We conduct empirical evaluation on simulated data to complement theoretical findings and underscore the efficacy of our approach. To show a practical applicability of our method, we further apply it to the co-expression analysis of liver gene expression data where the given basis corresponds to the adjacency matrix of the co-expression network.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108176"},"PeriodicalIF":1.5,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143725905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A deflation-adjusted Bayesian information criterion for selecting the number of clusters in K-means clustering
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-26 DOI: 10.1016/j.csda.2025.108170
Masao Ueki
{"title":"A deflation-adjusted Bayesian information criterion for selecting the number of clusters in K-means clustering","authors":"Masao Ueki","doi":"10.1016/j.csda.2025.108170","DOIUrl":"10.1016/j.csda.2025.108170","url":null,"abstract":"<div><div>A deflation-adjusted Bayesian information criterion is proposed by introducing a closed-form adjustment to the variance estimate after K-means clustering. An expected lower bound of the deflation in the variance estimate after K-means clustering is derived and used as an adjustment factor for the variance estimates. The deflation-adjusted variance estimates are applied to the Bayesian information criterion under the Gaussian model for selecting the number of clusters. The closed-form expression makes the proposed deflation-adjusted Bayesian information criterion computationally efficient. Simulation studies show that the deflation-adjusted Bayesian information criterion performs better than other existing clustering methods in some situations, including K-means clustering with the number of clusters selected by standard Bayesian information criteria, the gap statistic, the average silhouette score, the prediction strength, and clustering using a Gaussian mixture model with the Bayesian information criterion. The proposed method is illustrated through a real data application for clustering human genomic data from the 1000 Genomes Project.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108170"},"PeriodicalIF":1.5,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143738691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multiple imputation approach for flexible modelling of interval-censored data with missing and censored covariates
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-25 DOI: 10.1016/j.csda.2025.108177
Yichen Lou , Yuqing Ma , Liming Xiang , Jianguo Sun
{"title":"A multiple imputation approach for flexible modelling of interval-censored data with missing and censored covariates","authors":"Yichen Lou ,&nbsp;Yuqing Ma ,&nbsp;Liming Xiang ,&nbsp;Jianguo Sun","doi":"10.1016/j.csda.2025.108177","DOIUrl":"10.1016/j.csda.2025.108177","url":null,"abstract":"<div><div>This paper discusses regression analysis of interval-censored failure time data that commonly occur in biomedical studies among others. For the situation, the failure event of interest is known only to occur within an interval instead of being observed exactly. In addition to interval censoring on the failure time of interest, sometimes covariates may be missing or suffer censoring, which can bring extra theoretical and computational challenges for the regression analysis. To deal with such data, we propose a novel multiple imputation approach with the use of the rejection sampling under a class of semiparametric transformation models. The proposed method is flexible and can lead to more efficient estimation than the existing methods, and the resulting estimators are shown to be consistent and asymptotically normal. An extensive simulation study is conducted and demonstrates that the proposed approach works well in practice. Finally, we apply the proposed approach to a set of real data on Alzheimer's disease that motivated this study.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108177"},"PeriodicalIF":1.5,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143714600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse factor analysis for categorical data with the group-sparse generalized singular value decomposition
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-25 DOI: 10.1016/j.csda.2025.108179
Ju-Chi Yu , Julie Le Borgne , Anjali Krishnan , Arnaud Gloaguen , Cheng-Ta Yang , Laura A. Rabin , Hervé Abdi , Vincent Guillemot
{"title":"Sparse factor analysis for categorical data with the group-sparse generalized singular value decomposition","authors":"Ju-Chi Yu ,&nbsp;Julie Le Borgne ,&nbsp;Anjali Krishnan ,&nbsp;Arnaud Gloaguen ,&nbsp;Cheng-Ta Yang ,&nbsp;Laura A. Rabin ,&nbsp;Hervé Abdi ,&nbsp;Vincent Guillemot","doi":"10.1016/j.csda.2025.108179","DOIUrl":"10.1016/j.csda.2025.108179","url":null,"abstract":"<div><div>Correspondence analysis, multiple correspondence analysis, and their discriminant counterparts (i.e., discriminant simple correspondence analysis and discriminant multiple correspondence analysis) are methods of choice for analyzing multivariate categorical data. In these methods, variables are integrated into optimal components computed as linear combinations whose weights are obtained from a generalized singular value decomposition (GSVD) that integrates specific metric constraints on the rows and columns of the original data matrix. The weights of the linear combinations are, in turn, used to interpret the components, and this interpretation is facilitated when components are 1) pairwise orthogonal and 2) when the values of the weights are either large or small but not intermediate—a configuration called a simple or a sparse structure. To obtain such simple configurations, the optimization problem solved by the GSVD is extended to include new constraints that implement component orthogonality and sparse weights. Because multiple correspondence analysis represents qualitative variables by a set of binary columns in the data matrix, an additional group constraint is added to the optimization problem in order to sparsify the whole set of columns representing one qualitative variable. This method—called group-sparse GSVD (gsGSVD)—integrates these constraints in a new algorithm via an iterative projection scheme onto the intersection of subspaces where each subspace implements a specific constraint. This algorithm is described in details, and we show how it can be adapted to the sparsification of simple and multiple correspondence analysis (as well as their barycentric discriminant analysis versions). This algorithm is illustrated with the analysis of four different data sets—each illustrating the sparsification of a particular CA-based method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108179"},"PeriodicalIF":1.5,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selecting time-series hyperparameters with the artificial jackknife
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-18 DOI: 10.1016/j.csda.2025.108173
Filippo Pellegrino
{"title":"Selecting time-series hyperparameters with the artificial jackknife","authors":"Filippo Pellegrino","doi":"10.1016/j.csda.2025.108173","DOIUrl":"10.1016/j.csda.2025.108173","url":null,"abstract":"<div><div>A generalisation of the delete-<em>d</em> jackknife is proposed for solving hyperparameter selection problems in time series. The method is referred to as the artificial delete-<em>d</em> jackknife, emphasizing that it replaces the classic removal step with a fictitious deletion, wherein observed data points are replaced with artificial missing values. This procedure preserves the data order, ensuring seamless compatibility with time series. The approach is asymptotically justified and its finite-sample properties are studied via simulations. In addition, an application based on foreign exchange rates illustrates its practical relevance.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108173"},"PeriodicalIF":1.5,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hidden semi-Markov models with inhomogeneous state dwell-time distributions
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-03-14 DOI: 10.1016/j.csda.2025.108171
Jan-Ole Koslik
{"title":"Hidden semi-Markov models with inhomogeneous state dwell-time distributions","authors":"Jan-Ole Koslik","doi":"10.1016/j.csda.2025.108171","DOIUrl":"10.1016/j.csda.2025.108171","url":null,"abstract":"<div><div>The well-established methodology for the estimation of hidden semi-Markov models (HSMMs) as hidden Markov models (HMMs) with extended state spaces is further developed. Covariate influences are incorporated across all aspects of the state process model, in particular regarding the distributions governing the state dwell time. The special case of periodically varying covariate effects on the state dwell-time distributions — and possibly the conditional transition probabilities — is examined in detail. Important properties of these models are derived, including the periodically varying unconditional state distribution as well as the overall state dwell-time distribution. Simulation studies are conducted to assess key properties of these models and provide recommendations for hyperparameter settings. A case study involving an HSMM with periodically varying dwell-time distributions is presented to analyse the movement trajectory of an Arctic muskox, demonstrating the practical relevance of the developed methodology.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108171"},"PeriodicalIF":1.5,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信