基于多变量秩距离相关的无分布和无模型多变量特征筛选

IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY
Shaofei Zhao, Guifang Fu
{"title":"基于多变量秩距离相关的无分布和无模型多变量特征筛选","authors":"Shaofei Zhao,&nbsp;Guifang Fu","doi":"10.1016/j.jmva.2022.105081","DOIUrl":null,"url":null,"abstract":"<div><p>Feature screening approaches are effective in selecting active features from data with ultrahigh dimensionality and increasing complexity; however, many existing feature screening approaches are either restricted to a univariate response or rely on some distribution or model assumptions. In this article, we propose a sure independence screening approach based on the multivariate rank distance correlation (MrDc-SIS). The MrDc-SIS achieves multiple desirable properties such as being distribution-free, completely nonparametric, scale-free and robust for outliers or heavy tails. Moreover, the MrDc-SIS can be used to screen either univariate or multivariate responses and either one dimensional or multi-dimensional predictors. We establish the theoretical sure screening and rank consistency properties of the MrDc-SIS approach under a mild condition by lifting previous assumptions about the finite moments. Simulation studies demonstrate that MrDc-SIS outperforms eight other closely relevant approaches under certain settings. We also apply the MrDc-SIS approach to a multi-omics ovarian carcinoma data downloaded from The Cancer Genome Atlas (TCGA).</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"192 ","pages":"Article 105081"},"PeriodicalIF":1.4000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Distribution-free and model-free multivariate feature screening via multivariate rank distance correlation\",\"authors\":\"Shaofei Zhao,&nbsp;Guifang Fu\",\"doi\":\"10.1016/j.jmva.2022.105081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Feature screening approaches are effective in selecting active features from data with ultrahigh dimensionality and increasing complexity; however, many existing feature screening approaches are either restricted to a univariate response or rely on some distribution or model assumptions. In this article, we propose a sure independence screening approach based on the multivariate rank distance correlation (MrDc-SIS). The MrDc-SIS achieves multiple desirable properties such as being distribution-free, completely nonparametric, scale-free and robust for outliers or heavy tails. Moreover, the MrDc-SIS can be used to screen either univariate or multivariate responses and either one dimensional or multi-dimensional predictors. We establish the theoretical sure screening and rank consistency properties of the MrDc-SIS approach under a mild condition by lifting previous assumptions about the finite moments. Simulation studies demonstrate that MrDc-SIS outperforms eight other closely relevant approaches under certain settings. We also apply the MrDc-SIS approach to a multi-omics ovarian carcinoma data downloaded from The Cancer Genome Atlas (TCGA).</p></div>\",\"PeriodicalId\":16431,\"journal\":{\"name\":\"Journal of Multivariate Analysis\",\"volume\":\"192 \",\"pages\":\"Article 105081\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Multivariate Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0047259X22000811\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Multivariate Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0047259X22000811","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 3

摘要

特征筛选方法在从具有超高维度和不断增加的复杂性的数据中选择活跃特征方面是有效的;然而,许多现有的特征筛选方法要么局限于单变量响应,要么依赖于一些分布或模型假设。在这篇文章中,我们提出了一种基于多元秩距离相关性(MrDc-SIS)的确定独立性筛选方法。MrDc SIS实现了多种理想的特性,如无分布、完全非参数、无标度和对异常值或重尾的鲁棒性。此外,MrDc-SIS可用于筛选单变量或多变量响应以及一维或多维预测因子。我们通过提升先前关于有限矩的假设,在温和条件下建立了MrDc-SIS方法的理论确筛和秩一致性性质。仿真研究表明,在某些设置下,MrDc-SIS优于其他八种密切相关的方法。我们还将MrDc-SIS方法应用于从癌症基因组图谱(TCGA)下载的多组学卵巢癌数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Distribution-free and model-free multivariate feature screening via multivariate rank distance correlation

Feature screening approaches are effective in selecting active features from data with ultrahigh dimensionality and increasing complexity; however, many existing feature screening approaches are either restricted to a univariate response or rely on some distribution or model assumptions. In this article, we propose a sure independence screening approach based on the multivariate rank distance correlation (MrDc-SIS). The MrDc-SIS achieves multiple desirable properties such as being distribution-free, completely nonparametric, scale-free and robust for outliers or heavy tails. Moreover, the MrDc-SIS can be used to screen either univariate or multivariate responses and either one dimensional or multi-dimensional predictors. We establish the theoretical sure screening and rank consistency properties of the MrDc-SIS approach under a mild condition by lifting previous assumptions about the finite moments. Simulation studies demonstrate that MrDc-SIS outperforms eight other closely relevant approaches under certain settings. We also apply the MrDc-SIS approach to a multi-omics ovarian carcinoma data downloaded from The Cancer Genome Atlas (TCGA).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Multivariate Analysis
Journal of Multivariate Analysis 数学-统计学与概率论
CiteScore
2.40
自引率
25.00%
发文量
108
审稿时长
74 days
期刊介绍: Founded in 1971, the Journal of Multivariate Analysis (JMVA) is the central venue for the publication of new, relevant methodology and particularly innovative applications pertaining to the analysis and interpretation of multidimensional data. The journal welcomes contributions to all aspects of multivariate data analysis and modeling, including cluster analysis, discriminant analysis, factor analysis, and multidimensional continuous or discrete distribution theory. Topics of current interest include, but are not limited to, inferential aspects of Copula modeling Functional data analysis Graphical modeling High-dimensional data analysis Image analysis Multivariate extreme-value theory Sparse modeling Spatial statistics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信