Detection of influential observations in high-dimensional survival data

Q4 Mathematics
P. Divya, S. Suresh
{"title":"Detection of influential observations in high-dimensional survival data","authors":"P. Divya, S. Suresh","doi":"10.1080/23737484.2023.2266404","DOIUrl":null,"url":null,"abstract":"AbstractSurvival analysis is a statistical technique mainly used to analyze time-to-event data. Identification of influential observation attains greater importance since it leads to discovering new prognostic factors. Influential observation in survival typically points to individuals whose survival time is extremely short or long in comparison to others. Particularly, when the data possess more covariates than the observations, all classical approaches fail to perform. Hence, dimensionality reduction is necessary for choosing appropriate variables and it has been done by popular techniques such as LASSO and elastic net algorithm. This paper consider high-dimensional breast cancer data, and its dimensionality is reduced using variable selection methods. Subsequently, the rank product test and martingale residuals are used to identify an influential observation. Furthermore, a resampling technique is used to validate the consistency and robustness of the methods. The novelty of this paper lies in comparing the prediction accuracy of datasets with and without outliers using Random Survival Forest (RSF) for different training fractions. Comparatively, the RSF result demonstrates that the LASSO approach outperform others in the absence of outliers. Therefore, we suggest reducing dimensionality using the LASSO variable selection technique first, followed by removing likely outliers to improve the performance of classification algorithms.KEYWORDS: Survival analysisvariable selection methodsmartingale residualsrank product testrandom survival forest Disclosure statementThe authors declare that there is no conflict of interest regarding the publication of this paper.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications in Statistics Case Studies Data Analysis and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/23737484.2023.2266404","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

AbstractSurvival analysis is a statistical technique mainly used to analyze time-to-event data. Identification of influential observation attains greater importance since it leads to discovering new prognostic factors. Influential observation in survival typically points to individuals whose survival time is extremely short or long in comparison to others. Particularly, when the data possess more covariates than the observations, all classical approaches fail to perform. Hence, dimensionality reduction is necessary for choosing appropriate variables and it has been done by popular techniques such as LASSO and elastic net algorithm. This paper consider high-dimensional breast cancer data, and its dimensionality is reduced using variable selection methods. Subsequently, the rank product test and martingale residuals are used to identify an influential observation. Furthermore, a resampling technique is used to validate the consistency and robustness of the methods. The novelty of this paper lies in comparing the prediction accuracy of datasets with and without outliers using Random Survival Forest (RSF) for different training fractions. Comparatively, the RSF result demonstrates that the LASSO approach outperform others in the absence of outliers. Therefore, we suggest reducing dimensionality using the LASSO variable selection technique first, followed by removing likely outliers to improve the performance of classification algorithms.KEYWORDS: Survival analysisvariable selection methodsmartingale residualsrank product testrandom survival forest Disclosure statementThe authors declare that there is no conflict of interest regarding the publication of this paper.
高维生存数据中有影响的观测值的检测
摘要生存分析是一种主要用于分析时间事件数据的统计技术。确定有影响的观测结果更为重要,因为它会导致发现新的预后因素。对生存有影响的观察通常指出,与他人相比,个体的生存时间极短或极长。特别是,当数据具有比观测值更多的协变量时,所有经典方法都无法执行。因此,为了选择合适的变量,降维是必要的,LASSO和弹性网络算法等流行的技术已经完成了降维。本文考虑高维乳腺癌数据,采用变量选择方法对其进行降维。随后,使用秩积检验和鞅残差来确定有影响的观测值。此外,采用重采样技术验证了方法的一致性和鲁棒性。本文的新颖之处在于比较了使用随机生存森林(RSF)对不同训练分数具有和不具有异常值的数据集的预测精度。相比之下,RSF结果表明LASSO方法在没有异常值的情况下优于其他方法。因此,我们建议首先使用LASSO变量选择技术降低维数,然后去除可能的异常值,以提高分类算法的性能。关键词:生存分析;变量选择方法;残差;
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.00
自引率
0.00%
发文量
29
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信