大规模和超高维生存数据的无模型特征筛选

IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY
Yingli Pan, Haoyu Wang, Zhan Liu
{"title":"大规模和超高维生存数据的无模型特征筛选","authors":"Yingli Pan,&nbsp;Haoyu Wang,&nbsp;Zhan Liu","doi":"10.1007/s10463-024-00912-x","DOIUrl":null,"url":null,"abstract":"<div><p>This paper provides a novel perspective on feature screening in the analysis of high-dimensional right-censored large-<i>p</i>-large-<i>N</i> survival data. The research introduces a distributed feature screening method known as Aggregated Distance Correlation Screening (ADCS). The proposed screening framework involves expressing the distance correlation measure as a function of multiple component parameters, each of which can be estimated in a distributed manner using a natural U-statistic from data segments. By aggregating the component estimates, a final correlation estimate is obtained, facilitating feature screening. Importantly, this approach does not necessitate any specific model specification for responses or predictors and is effective with heavy-tailed data. The study establishes the consistency of the proposed aggregated correlation estimator <span>\\(\\widetilde{\\omega }_{j}\\)</span> under mild conditions and demonstrates the sure screening property of the ADCS. Empirical results from both simulated and real datasets confirm the efficacy and practicality of the ADCS approach proposed in this paper.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"77 1","pages":"155 - 190"},"PeriodicalIF":0.8000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Model free feature screening for large scale and ultrahigh dimensional survival data\",\"authors\":\"Yingli Pan,&nbsp;Haoyu Wang,&nbsp;Zhan Liu\",\"doi\":\"10.1007/s10463-024-00912-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper provides a novel perspective on feature screening in the analysis of high-dimensional right-censored large-<i>p</i>-large-<i>N</i> survival data. The research introduces a distributed feature screening method known as Aggregated Distance Correlation Screening (ADCS). The proposed screening framework involves expressing the distance correlation measure as a function of multiple component parameters, each of which can be estimated in a distributed manner using a natural U-statistic from data segments. By aggregating the component estimates, a final correlation estimate is obtained, facilitating feature screening. Importantly, this approach does not necessitate any specific model specification for responses or predictors and is effective with heavy-tailed data. The study establishes the consistency of the proposed aggregated correlation estimator <span>\\\\(\\\\widetilde{\\\\omega }_{j}\\\\)</span> under mild conditions and demonstrates the sure screening property of the ADCS. Empirical results from both simulated and real datasets confirm the efficacy and practicality of the ADCS approach proposed in this paper.</p></div>\",\"PeriodicalId\":55511,\"journal\":{\"name\":\"Annals of the Institute of Statistical Mathematics\",\"volume\":\"77 1\",\"pages\":\"155 - 190\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2024-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of the Institute of Statistical Mathematics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10463-024-00912-x\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of the Institute of Statistical Mathematics","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s10463-024-00912-x","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

本文为高维右删大p-大n存活数据分析中的特征筛选提供了一个新的视角。本研究引入了一种分布式特征筛选方法——聚合距离相关筛选(ADCS)。提出的筛选框架包括将距离相关度量表示为多个组件参数的函数,每个组件参数都可以使用数据段的自然u统计量以分布式方式进行估计。通过汇总分量估计,得到最终的相关性估计,便于特征筛选。重要的是,这种方法不需要任何特定的模型规范来响应或预测,并且对重尾数据有效。研究建立了所提出的聚合相关估计器\(\widetilde{\omega }_{j}\)在温和条件下的一致性,证明了ADCS具有可靠的筛选性能。模拟和真实数据集的实证结果证实了本文提出的ADCS方法的有效性和实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Model free feature screening for large scale and ultrahigh dimensional survival data

This paper provides a novel perspective on feature screening in the analysis of high-dimensional right-censored large-p-large-N survival data. The research introduces a distributed feature screening method known as Aggregated Distance Correlation Screening (ADCS). The proposed screening framework involves expressing the distance correlation measure as a function of multiple component parameters, each of which can be estimated in a distributed manner using a natural U-statistic from data segments. By aggregating the component estimates, a final correlation estimate is obtained, facilitating feature screening. Importantly, this approach does not necessitate any specific model specification for responses or predictors and is effective with heavy-tailed data. The study establishes the consistency of the proposed aggregated correlation estimator \(\widetilde{\omega }_{j}\) under mild conditions and demonstrates the sure screening property of the ADCS. Empirical results from both simulated and real datasets confirm the efficacy and practicality of the ADCS approach proposed in this paper.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.00
自引率
0.00%
发文量
39
审稿时长
6-12 weeks
期刊介绍: Annals of the Institute of Statistical Mathematics (AISM) aims to provide a forum for open communication among statisticians, and to contribute to the advancement of statistics as a science to enable humans to handle information in order to cope with uncertainties. It publishes high-quality papers that shed new light on the theoretical, computational and/or methodological aspects of statistical science. Emphasis is placed on (a) development of new methodologies motivated by real data, (b) development of unifying theories, and (c) analysis and improvement of existing methodologies and theories.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信