在高维基因表达数据中优先选择与生存结果相关基因的稳健集合特征选择方法

IF 2.3

Frontiers in systems biology Pub Date : 2024-03-21 DOI:10.3389/fsysb.2024.1355595

Phi Le, Xingyue Gong, Leah Ung, Hai Yang, Bridget P Keenan, Li Zhang, Tao He

{"title":"在高维基因表达数据中优先选择与生存结果相关基因的稳健集合特征选择方法","authors":"Phi Le, Xingyue Gong, Leah Ung, Hai Yang, Bridget P Keenan, Li Zhang, Tao He","doi":"10.3389/fsysb.2024.1355595","DOIUrl":null,"url":null,"abstract":"Exploring features associated with the clinical outcome of interest is a rapidly advancing area of research. However, with contemporary sequencing technologies capable of identifying over thousands of genes per sample, there is a challenge in constructing efficient prediction models that balance accuracy and resource utilization. To address this challenge, researchers have developed feature selection methods to enhance performance, reduce overfitting, and ensure resource efficiency. However, applying feature selection models to survival analysis, particularly in clinical datasets characterized by substantial censoring and limited sample sizes, introduces unique challenges. We propose a robust ensemble feature selection approach integrated with group Lasso to identify compelling features and evaluate its performance in predicting survival outcomes. Our approach consistently outperforms established models across various criteria through extensive simulations, demonstrating low false discovery rates, high sensitivity, and high stability. Furthermore, we applied the approach to a colorectal cancer dataset from The Cancer Genome Atlas, showcasing its effectiveness by generating a composite score based on the selected genes to correctly distinguish different subtypes of the patients. In summary, our proposed approach excels in selecting impactful features from high-dimensional data, yielding better outcomes compared to contemporary state-of-the-art models.","PeriodicalId":73109,"journal":{"name":"Frontiers in systems biology","volume":" 88","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A robust ensemble feature selection approach to prioritize genes associated with survival outcome in high-dimensional gene expression data\",\"authors\":\"Phi Le, Xingyue Gong, Leah Ung, Hai Yang, Bridget P Keenan, Li Zhang, Tao He\",\"doi\":\"10.3389/fsysb.2024.1355595\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Exploring features associated with the clinical outcome of interest is a rapidly advancing area of research. However, with contemporary sequencing technologies capable of identifying over thousands of genes per sample, there is a challenge in constructing efficient prediction models that balance accuracy and resource utilization. To address this challenge, researchers have developed feature selection methods to enhance performance, reduce overfitting, and ensure resource efficiency. However, applying feature selection models to survival analysis, particularly in clinical datasets characterized by substantial censoring and limited sample sizes, introduces unique challenges. We propose a robust ensemble feature selection approach integrated with group Lasso to identify compelling features and evaluate its performance in predicting survival outcomes. Our approach consistently outperforms established models across various criteria through extensive simulations, demonstrating low false discovery rates, high sensitivity, and high stability. Furthermore, we applied the approach to a colorectal cancer dataset from The Cancer Genome Atlas, showcasing its effectiveness by generating a composite score based on the selected genes to correctly distinguish different subtypes of the patients. In summary, our proposed approach excels in selecting impactful features from high-dimensional data, yielding better outcomes compared to contemporary state-of-the-art models.\",\"PeriodicalId\":73109,\"journal\":{\"name\":\"Frontiers in systems biology\",\"volume\":\" 88\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in systems biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fsysb.2024.1355595\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fsysb.2024.1355595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

探索与相关临床结果相关的特征是一个快速发展的研究领域。然而，由于当代的测序技术能够识别每个样本中超过数千个基因，因此在构建兼顾准确性和资源利用率的高效预测模型方面存在挑战。为了应对这一挑战，研究人员开发了特征选择方法来提高性能、减少过拟合并确保资源效率。然而，将特征选择模型应用于生存分析，尤其是应用于具有大量删减和有限样本量特点的临床数据集，会带来独特的挑战。我们提出了一种与组 Lasso 相结合的稳健集合特征选择方法，用于识别有说服力的特征，并评估其在预测生存结果方面的性能。通过大量模拟，我们的方法在各种标准上始终优于既有模型，显示出低错误发现率、高灵敏度和高稳定性。此外，我们还将该方法应用于《癌症基因组图谱》中的结直肠癌数据集，通过根据所选基因生成综合评分来正确区分患者的不同亚型，从而展示了该方法的有效性。总之，与当代最先进的模型相比，我们提出的方法在从高维数据中选择有影响的特征方面表现出色，能产生更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A robust ensemble feature selection approach to prioritize genes associated with survival outcome in high-dimensional gene expression data

Exploring features associated with the clinical outcome of interest is a rapidly advancing area of research. However, with contemporary sequencing technologies capable of identifying over thousands of genes per sample, there is a challenge in constructing efficient prediction models that balance accuracy and resource utilization. To address this challenge, researchers have developed feature selection methods to enhance performance, reduce overfitting, and ensure resource efficiency. However, applying feature selection models to survival analysis, particularly in clinical datasets characterized by substantial censoring and limited sample sizes, introduces unique challenges. We propose a robust ensemble feature selection approach integrated with group Lasso to identify compelling features and evaluate its performance in predicting survival outcomes. Our approach consistently outperforms established models across various criteria through extensive simulations, demonstrating low false discovery rates, high sensitivity, and high stability. Furthermore, we applied the approach to a colorectal cancer dataset from The Cancer Genome Atlas, showcasing its effectiveness by generating a composite score based on the selected genes to correctly distinguish different subtypes of the patients. In summary, our proposed approach excels in selecting impactful features from high-dimensional data, yielding better outcomes compared to contemporary state-of-the-art models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in systems biology

自引率

0.00%

发文量