Computational Statistics & Data Analysis最新文献

筛选
英文 中文
Fully nonparametric inverse probability weighting estimation with nonignorable missing data and its extension to missing quantile regression 不可忽略缺失数据的全非参数逆概率加权估计及其在缺失分位数回归中的推广
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-01-20 DOI: 10.1016/j.csda.2025.108127
Lingnan Tai , Li Tao , Jianxin Pan , Man-lai Tang , Keming Yu , Wolfgang Karl Härdle , Maozai Tian
{"title":"Fully nonparametric inverse probability weighting estimation with nonignorable missing data and its extension to missing quantile regression","authors":"Lingnan Tai ,&nbsp;Li Tao ,&nbsp;Jianxin Pan ,&nbsp;Man-lai Tang ,&nbsp;Keming Yu ,&nbsp;Wolfgang Karl Härdle ,&nbsp;Maozai Tian","doi":"10.1016/j.csda.2025.108127","DOIUrl":"10.1016/j.csda.2025.108127","url":null,"abstract":"<div><div>In practical data analysis, the not-missing-at-random (NMAR) mechanism is typically more aligned with the natural causes of missing data. The NMAR mechanism is complicated and adaptable, surpassing the capabilities of classical methods in addressing this missing data challenge. A comprehensive analysis framework for the NMAR problem is established, and a novel inverse probability weighting method based on the fully nonparametric exponential tilting model and sieve minimum distance is constructed. Additionally, given the broad field of applications for the quantile regression model, fully nonparametric inverse probability weighting and augmented inverse probability weighting for estimating quantile regression under NMAR are introduced. Simulation studies demonstrate that the proposed methods are better suited for various flexible propensity score functions. In practical applications, our methods are applied to the AIDS Clinical Trials Group Study 175 data to examine the effectiveness of treatments on HIV-infected subjects.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108127"},"PeriodicalIF":1.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantile feature screening for infinite dimensional data under FDR control FDR控制下无限维数据的分位数特征筛选
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-01-20 DOI: 10.1016/j.csda.2025.108132
Zhentao Tian, Zhongzhan Zhang
{"title":"Quantile feature screening for infinite dimensional data under FDR control","authors":"Zhentao Tian,&nbsp;Zhongzhan Zhang","doi":"10.1016/j.csda.2025.108132","DOIUrl":"10.1016/j.csda.2025.108132","url":null,"abstract":"<div><div>This study is focused on the detection of effects of features on an infinite dimensional response through the conditional spatial quantiles (CSQ) of the response given the features, and develops a novel model-free feature screening procedure for the CSQ regression function. Firstly, a new metric named kernel-based conditional quantile dependence (KCQD) is proposed to measure the dependence of the CSQ on a feature. The metric equals 0 if and only if the feature is independent of the CSQ of the response, and thus is employed to detect the contribution of a feature. Then a two-step feature screening procedure with the estimated KCQD scores is developed via a distributed strategy. Theoretical analyses reveal that the new two-step screening method not only has screening consistency and sure screening properties but also achieves control over false discovery rate (FDR). Simulation studies show its ability to control the expected FDR level while maintaining high screening power. The proposed procedure is applied to analyze a magnetoencephalography dataset, and the identified signal positions are anatomically interpretable.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108132"},"PeriodicalIF":1.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Markov model for estimating the cost-effectiveness of immunotherapy for newly diagnosed multiple myeloma patients 估计新诊断多发性骨髓瘤患者免疫治疗成本-效果的马尔可夫模型
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-01-17 DOI: 10.1016/j.csda.2025.108130
Massimo Bilancia , Antonio Giovanni Solimando , Fabio Manca , Angelo Vacca , Roberto Ria
{"title":"A Markov model for estimating the cost-effectiveness of immunotherapy for newly diagnosed multiple myeloma patients","authors":"Massimo Bilancia ,&nbsp;Antonio Giovanni Solimando ,&nbsp;Fabio Manca ,&nbsp;Angelo Vacca ,&nbsp;Roberto Ria","doi":"10.1016/j.csda.2025.108130","DOIUrl":"10.1016/j.csda.2025.108130","url":null,"abstract":"<div><div>Multiple myeloma (MM) is a malignancy of plasma cells, originating from B lymphocytes and accumulating within the bone marrow. The prevalence of MM has increased in industrialized countries, representing 1-1.8% of all cancers and 15% of hematologic malignancies. Immunotherapy has broadened therapeutic options for MM, offering treatments with generally improved efficacy and reduced toxicity compared to conventional therapies. Daratumumab, a monoclonal antibody recently granted regulatory approval, exemplifies this advancement, demonstrating improved patient outcomes. However, the substantial cost of daratumumab has significantly increased per-patient treatment expenditures. Consequently, the economic burden associated with this new class of therapies warrants careful evaluation of their cost-effectiveness. To address this, a six-state non-stationary Markov model was developed for cost-effectiveness analysis of immunotherapy in newly diagnosed MM patients and, more broadly, in the oncohematological patient population. This model aims to provide healthcare professionals and policymakers with actionable insights into cost-effective interventions, supporting informed decisions regarding optimal treatment strategies.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108130"},"PeriodicalIF":1.5,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extremal local linear quantile regression for nonlinear dependent processes 非线性相关过程的极值局部线性分位数回归
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-01-17 DOI: 10.1016/j.csda.2025.108128
Fengyang He , Huixia Judy Wang
{"title":"Extremal local linear quantile regression for nonlinear dependent processes","authors":"Fengyang He ,&nbsp;Huixia Judy Wang","doi":"10.1016/j.csda.2025.108128","DOIUrl":"10.1016/j.csda.2025.108128","url":null,"abstract":"<div><div>Estimating extreme conditional quantiles accurately in the presence of data sparsity in the tails is a challenging and important problem. While there is existing literature on quantile analysis, limited work has been done on capturing nonlinear relationships in dependent data structures for extreme quantile estimation. They propose a novel estimation procedure that combines the local linear quantile regression method and extreme value theory. They develop a new enhanced Hill estimator for the conditional extreme value index, constructed based on the local linear quantile estimators at a sequence of quantile levels. That approach allows for data-adaptive weights assigned to different quantiles, providing flexibility and potential for enhancing estimation efficiency. Furthermore, they propose an estimator for extreme conditional quantiles by extrapolating from the intermediate quantiles. Their methodology enables both point and interval estimation of extreme conditional quantiles for processes with an <em>α</em>-mixing dependence structure. They derive the Bahadur representation of the intermediate quantile estimators within the local linear extreme-quantile framework and establish the asymptotic properties of their proposed estimators. Simulation studies and real data analysis are conducted to demonstrate the effectiveness and performance of their methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108128"},"PeriodicalIF":1.5,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneity-aware transfer learning for high-dimensional linear regression models 高维线性回归模型的异构感知迁移学习
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-01-16 DOI: 10.1016/j.csda.2025.108129
Yanjin Peng, Lei Wang
{"title":"Heterogeneity-aware transfer learning for high-dimensional linear regression models","authors":"Yanjin Peng,&nbsp;Lei Wang","doi":"10.1016/j.csda.2025.108129","DOIUrl":"10.1016/j.csda.2025.108129","url":null,"abstract":"<div><div>Transfer learning can refine the performance of a target model through utilizing beneficial information from relevant source datasets. In practice, however, auxiliary samples may be collected from different sub-populations with non-negligible heterogeneity. In this paper we assume that each dataset involves a common parameter vector and dataset-specific nuisance parameters and extend the transfer learning framework to account for heterogeneous models. Specifically, we adapt the decorrelated score technique to deal with the dataset-specific nuisance parameters and develop a strategy to leverage possible shared information from relevant source datasets. To avoid negative transfer, a completely data-driven algorithm is provided to determine the transferable sources. The convergence rate of the proposed estimator is investigated and the source detection consistency is also verified. Extensive numerical experiments are conducted to evaluate the proposed transfer learning algorithms, and an application to the Genotype-Tissue Expression dataset is exhibited.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108129"},"PeriodicalIF":1.5,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust generalized canonical correlation analysis based on scatter matrices 基于散点矩阵的鲁棒广义典型相关分析
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-01-10 DOI: 10.1016/j.csda.2025.108126
Nadia L. Kudraszow , Alejandra V. Vahnovan , Julieta Ferrario , M. Victoria Fasano
{"title":"Robust generalized canonical correlation analysis based on scatter matrices","authors":"Nadia L. Kudraszow ,&nbsp;Alejandra V. Vahnovan ,&nbsp;Julieta Ferrario ,&nbsp;M. Victoria Fasano","doi":"10.1016/j.csda.2025.108126","DOIUrl":"10.1016/j.csda.2025.108126","url":null,"abstract":"<div><div>Generalized Canonical Correlation Analysis (GCCA) is a powerful tool for analyzing and understanding linear relationships between multiple sets of variables. However, its classical estimations are highly sensitive to outliers, which can significantly affect the results of the analysis. A functional version of GCCA is proposed, based on scatter matrices, leading to robust and Fisher consistent estimators for appropriate choices of the scatter matrix. In cases where scatter matrices are ill-conditioned, a modification based on an estimation of the precision matrix is introduced. A procedure to identify influential observations is also developed. A simulation study evaluates the finite-sample performance of the proposed methods under clean and contaminated samples. The advantages of the influential data detection approach are demonstrated through an application to a real dataset.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108126"},"PeriodicalIF":1.5,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient and distribution-free symmetry test for high-dimensional data based on energy statistics and random projections 基于能量统计和随机投影的高维数据的一种有效且无分布的对称性检验
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-01-08 DOI: 10.1016/j.csda.2024.108123
Bo Chen , Feifei Chen , Junxin Wang , Tao Qiu
{"title":"An efficient and distribution-free symmetry test for high-dimensional data based on energy statistics and random projections","authors":"Bo Chen ,&nbsp;Feifei Chen ,&nbsp;Junxin Wang ,&nbsp;Tao Qiu","doi":"10.1016/j.csda.2024.108123","DOIUrl":"10.1016/j.csda.2024.108123","url":null,"abstract":"<div><div>Testing the departures from symmetry is a critical issue in statistics. Over the last two decades, substantial effort has been invested in developing tests for central symmetry in multivariate and high-dimensional contexts. Traditional tests, which rely on Euclidean distance, face significant challenges in high-dimensional data. These tests struggle to capture overall central symmetry and are often limited to verifying whether the distribution's center aligns with the coordinate origin, a problem exacerbated by the “curse of dimensionality.” Furthermore, they tend to be computationally intensive, often making them impractical for large datasets. To overcome these limitations, we propose a nonparametric test based on the random projected energy distance, extending the energy distance test through random projections. This method effectively reduces data dimensions by projecting high-dimensional data onto lower-dimensional spaces, with the randomness ensuring maximum preservation of information. Theoretically, as the number of random projections approaches infinity, the risk of power loss from inadequate directions is mitigated. Leveraging <em>U</em>-statistic theory, our test's asymptotic null distribution is standard normal, which holds true regardless of the data dimensionality relative to sample size, thus eliminating the need for re-sampling to determine critical values. For computational efficiency with large datasets, we adopt a divide-and-average strategy, partitioning the data into smaller blocks of size <em>m</em>. Within each block, the estimates of the random projected energy distance are normally distributed. By averaging these estimates across all blocks, we derive a test statistic that is asymptotically standard normal. This method significantly reduces computational complexity from quadratic to linear in sample size, enhancing the feasibility of our test for extensive data analysis. Through extensive numerical studies, we have demonstrated the robust empirical performance of our test in terms of size and power, affirming its practical utility in statistical applications for high-dimensional data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108123"},"PeriodicalIF":1.5,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse vertex discriminant analysis: Variable selection for biomedical classification applications 稀疏顶点判别分析:生物医学分类应用的变量选择
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-01-07 DOI: 10.1016/j.csda.2025.108125
Alfonso Landeros , Seyoon Ko , Jack Z. Chang , Tong Tong Wu , Kenneth Lange
{"title":"Sparse vertex discriminant analysis: Variable selection for biomedical classification applications","authors":"Alfonso Landeros ,&nbsp;Seyoon Ko ,&nbsp;Jack Z. Chang ,&nbsp;Tong Tong Wu ,&nbsp;Kenneth Lange","doi":"10.1016/j.csda.2025.108125","DOIUrl":"10.1016/j.csda.2025.108125","url":null,"abstract":"<div><div>Modern biomedical datasets are often high-dimensional at multiple levels of biological organization. Practitioners must therefore grapple with data to estimate sparse or low-rank structures so as to adhere to the principle of parsimony. Further complicating matters is the presence of groups in data, each of which may have distinct associations with explanatory variables or be characterized by fundamentally different covariates. These themes in data analysis are explored in the context of classification. Vertex Discriminant Analysis (VDA) offers flexible linear and nonlinear models for classification that generalize the advantages of support vector machines to data with multiple classes. The proximal distance principle, which leverages projection and proximal operators in the design of practical algorithms, handily facilitates variable selection in VDA via nonconvex distance-to-set penalties directly controlling the number of active variables. Two flavors of sparse VDA are developed to address data in which instances may be homogeneous or heterogeneous with respect to predictors characterizing classes. Empirical studies illustrate how VDA is adapted to class-specific variable selection on simulated and real datasets, with an emphasis on applications to cancer classification via gene expression patterns.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108125"},"PeriodicalIF":1.5,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Component selection and variable selection for mixture regression models 混合回归模型的成分选择和变量选择
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-01-06 DOI: 10.1016/j.csda.2024.108124
Xuefei Qi , Xingbai Xu , Zhenghui Feng , Heng Peng
{"title":"Component selection and variable selection for mixture regression models","authors":"Xuefei Qi ,&nbsp;Xingbai Xu ,&nbsp;Zhenghui Feng ,&nbsp;Heng Peng","doi":"10.1016/j.csda.2024.108124","DOIUrl":"10.1016/j.csda.2024.108124","url":null,"abstract":"<div><div>Finite mixture regression models are commonly used to account for heterogeneity in populations and situations where the assumptions required for standard regression models may not hold. To expand the range of applicable distributions for components beyond the Gaussian distribution, other distributions, such as the exponential power distribution, the skew-normal distribution, and so on, are explored. To enable simultaneous model estimation, order selection, and variable selection, a penalized likelihood estimation approach that imposes penalties on both the mixing proportions and regression coefficients, which we call the double-penalized likelihood method is proposed in this paper. Four double-penalized likelihood functions and their performance are studied. The consistency of estimators, order selection, and variable selection are investigated. A modified expectation–maximization algorithm is proposed to implement the double-penalized likelihood method. Numerical simulations demonstrate the effectiveness of our proposed method and algorithm. Finally, the results of real data analysis are presented to illustrate the application of our approach. Overall, our study contributes to the development of mixture regression models and provides a useful tool for model and variable selection.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108124"},"PeriodicalIF":1.5,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mean shift-based clustering for misaligned functional data 基于均值偏移的不一致功能数据聚类
IF 1.5 3区 数学
Computational Statistics & Data Analysis Pub Date : 2025-01-02 DOI: 10.1016/j.csda.2024.108107
Andrew Welbaum, Wanli Qiao
{"title":"Mean shift-based clustering for misaligned functional data","authors":"Andrew Welbaum,&nbsp;Wanli Qiao","doi":"10.1016/j.csda.2024.108107","DOIUrl":"10.1016/j.csda.2024.108107","url":null,"abstract":"<div><div>Misalignment often occurs in functional data and can severely impact their clustering results. A clustering algorithm for misaligned functional data is developed, by adapting the original mean shift algorithm in the Euclidean space. This mean shift algorithm is applied to the quotient space of the orbits of the square root velocity functions induced by the misaligned functional data, in which the elastic distance is equipped. Convergence properties of this algorithm are studied. The efficacy of the algorithm is demonstrated through simulations and various real data applications.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108107"},"PeriodicalIF":1.5,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信