A robust ensemble feature selection approach to prioritize genes associated with survival outcome in high-dimensional gene expression data

Phi Le, Xingyue Gong, Leah Ung, Hai Yang, Bridget P Keenan, Li Zhang, Tao He
{"title":"A robust ensemble feature selection approach to prioritize genes associated with survival outcome in high-dimensional gene expression data","authors":"Phi Le, Xingyue Gong, Leah Ung, Hai Yang, Bridget P Keenan, Li Zhang, Tao He","doi":"10.3389/fsysb.2024.1355595","DOIUrl":null,"url":null,"abstract":"Exploring features associated with the clinical outcome of interest is a rapidly advancing area of research. However, with contemporary sequencing technologies capable of identifying over thousands of genes per sample, there is a challenge in constructing efficient prediction models that balance accuracy and resource utilization. To address this challenge, researchers have developed feature selection methods to enhance performance, reduce overfitting, and ensure resource efficiency. However, applying feature selection models to survival analysis, particularly in clinical datasets characterized by substantial censoring and limited sample sizes, introduces unique challenges. We propose a robust ensemble feature selection approach integrated with group Lasso to identify compelling features and evaluate its performance in predicting survival outcomes. Our approach consistently outperforms established models across various criteria through extensive simulations, demonstrating low false discovery rates, high sensitivity, and high stability. Furthermore, we applied the approach to a colorectal cancer dataset from The Cancer Genome Atlas, showcasing its effectiveness by generating a composite score based on the selected genes to correctly distinguish different subtypes of the patients. In summary, our proposed approach excels in selecting impactful features from high-dimensional data, yielding better outcomes compared to contemporary state-of-the-art models.","PeriodicalId":73109,"journal":{"name":"Frontiers in systems biology","volume":" 88","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fsysb.2024.1355595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Exploring features associated with the clinical outcome of interest is a rapidly advancing area of research. However, with contemporary sequencing technologies capable of identifying over thousands of genes per sample, there is a challenge in constructing efficient prediction models that balance accuracy and resource utilization. To address this challenge, researchers have developed feature selection methods to enhance performance, reduce overfitting, and ensure resource efficiency. However, applying feature selection models to survival analysis, particularly in clinical datasets characterized by substantial censoring and limited sample sizes, introduces unique challenges. We propose a robust ensemble feature selection approach integrated with group Lasso to identify compelling features and evaluate its performance in predicting survival outcomes. Our approach consistently outperforms established models across various criteria through extensive simulations, demonstrating low false discovery rates, high sensitivity, and high stability. Furthermore, we applied the approach to a colorectal cancer dataset from The Cancer Genome Atlas, showcasing its effectiveness by generating a composite score based on the selected genes to correctly distinguish different subtypes of the patients. In summary, our proposed approach excels in selecting impactful features from high-dimensional data, yielding better outcomes compared to contemporary state-of-the-art models.
在高维基因表达数据中优先选择与生存结果相关基因的稳健集合特征选择方法
探索与相关临床结果相关的特征是一个快速发展的研究领域。然而,由于当代的测序技术能够识别每个样本中超过数千个基因,因此在构建兼顾准确性和资源利用率的高效预测模型方面存在挑战。为了应对这一挑战,研究人员开发了特征选择方法来提高性能、减少过拟合并确保资源效率。然而,将特征选择模型应用于生存分析,尤其是应用于具有大量删减和有限样本量特点的临床数据集,会带来独特的挑战。我们提出了一种与组 Lasso 相结合的稳健集合特征选择方法,用于识别有说服力的特征,并评估其在预测生存结果方面的性能。通过大量模拟,我们的方法在各种标准上始终优于既有模型,显示出低错误发现率、高灵敏度和高稳定性。此外,我们还将该方法应用于《癌症基因组图谱》中的结直肠癌数据集,通过根据所选基因生成综合评分来正确区分患者的不同亚型,从而展示了该方法的有效性。总之,与当代最先进的模型相比,我们提出的方法在从高维数据中选择有影响的特征方面表现出色,能产生更好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信