Autoencoder techniques for survival analysis on renal cell carcinoma.

IF 2.9 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
PLoS ONE Pub Date : 2025-05-15 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0321045
Iñigo Sanz Ilundain, Laura Hernández-Lorenzo, Cristina Rodríguez-Antona, Jesús García-Donas, José L Ayala
{"title":"Autoencoder techniques for survival analysis on renal cell carcinoma.","authors":"Iñigo Sanz Ilundain, Laura Hernández-Lorenzo, Cristina Rodríguez-Antona, Jesús García-Donas, José L Ayala","doi":"10.1371/journal.pone.0321045","DOIUrl":null,"url":null,"abstract":"<p><p>Survival is the gold standard in oncology when determining the real impact of therapies in patients outcome. Thus, identifying molecular predictors of survival (like genetic alterations or transcriptomic patterns of gene expression) is one of the most relevant fields in current research. Statistical methods and metrics to analyze time-to-event data are crucial in understanding disease progression and the effectiveness of treatments. However, in the medical field, data is often high-dimensional, complicating the application of such methodologies. In this study, we addressed this challenge by compressing the high-dimensional transcriptomic data of patients treated with immunotherapy (avelumab + axitinib) and a TKI (sunitinib) into latent, meaningful features using autoencoders. We applied a semi-parametric statistical approach based on the COX Proportional Hazards model, coupled with Breslow's estimator, to predict each patient's Progression-Free Survival (PFS) and determine survival functions. Our analysis explored various penalty configurations and their combinations. Given the complexity of transcriptomic data, we extended our model to incorporate both tabular data and its graph variant, where edges represent protein-protein interactions between genes, offering a more insightful approach. Recognizing the interpretability challenges inherent in neural networks, particularly autoencoders, we analyzed the mutual information between genes in the original data and their latent feature representations to clarify which genes are most associated with specific latent variables. The results indicate that different types of autoencoders are better suited for different tasks: denoising autoencoders excel at accurate reconstruction, while the sparse variant is more effective at producing meaningful representations. Additionally, combining these penalties enhances both reconstruction quality and the interpretability of latent features. The interpretable models identified genes such as LRP2 and ACE2 as highly relevant to renal cell carcinoma. This research underscores the utility of autoencoders in managing high-dimensional data problems.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 5","pages":"e0321045"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12080797/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0321045","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Survival is the gold standard in oncology when determining the real impact of therapies in patients outcome. Thus, identifying molecular predictors of survival (like genetic alterations or transcriptomic patterns of gene expression) is one of the most relevant fields in current research. Statistical methods and metrics to analyze time-to-event data are crucial in understanding disease progression and the effectiveness of treatments. However, in the medical field, data is often high-dimensional, complicating the application of such methodologies. In this study, we addressed this challenge by compressing the high-dimensional transcriptomic data of patients treated with immunotherapy (avelumab + axitinib) and a TKI (sunitinib) into latent, meaningful features using autoencoders. We applied a semi-parametric statistical approach based on the COX Proportional Hazards model, coupled with Breslow's estimator, to predict each patient's Progression-Free Survival (PFS) and determine survival functions. Our analysis explored various penalty configurations and their combinations. Given the complexity of transcriptomic data, we extended our model to incorporate both tabular data and its graph variant, where edges represent protein-protein interactions between genes, offering a more insightful approach. Recognizing the interpretability challenges inherent in neural networks, particularly autoencoders, we analyzed the mutual information between genes in the original data and their latent feature representations to clarify which genes are most associated with specific latent variables. The results indicate that different types of autoencoders are better suited for different tasks: denoising autoencoders excel at accurate reconstruction, while the sparse variant is more effective at producing meaningful representations. Additionally, combining these penalties enhances both reconstruction quality and the interpretability of latent features. The interpretable models identified genes such as LRP2 and ACE2 as highly relevant to renal cell carcinoma. This research underscores the utility of autoencoders in managing high-dimensional data problems.

自编码器技术用于肾细胞癌的生存分析。
生存期是肿瘤学中确定治疗对患者预后的真正影响的金标准。因此,确定存活的分子预测因子(如基因改变或基因表达的转录组模式)是当前研究中最相关的领域之一。分析事件发生时间数据的统计方法和指标对于了解疾病进展和治疗效果至关重要。然而,在医学领域,数据往往是高维的,使这种方法的应用复杂化。在这项研究中,我们通过使用自动编码器将接受免疫治疗(avelumab + axitinib)和TKI(舒尼替尼)的患者的高维转录组数据压缩为潜在的、有意义的特征,解决了这一挑战。我们采用基于COX比例风险模型的半参数统计方法,结合Breslow估计器,预测每位患者的无进展生存期(PFS)并确定生存函数。我们的分析探讨了各种惩罚配置及其组合。鉴于转录组学数据的复杂性,我们扩展了我们的模型,将表格数据及其图形变体合并在一起,其中边缘表示基因之间的蛋白质-蛋白质相互作用,提供了一种更有洞察力的方法。认识到神经网络固有的可解释性挑战,特别是自编码器,我们分析了原始数据中基因之间的相互信息及其潜在特征表示,以澄清哪些基因与特定潜在变量最相关。结果表明,不同类型的自编码器更适合于不同的任务:去噪自编码器擅长于精确重建,而稀疏自编码器更有效地产生有意义的表示。此外,结合这些惩罚可以提高重建质量和潜在特征的可解释性。可解释的模型确定了LRP2和ACE2等基因与肾细胞癌高度相关。这项研究强调了自动编码器在管理高维数据问题中的效用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
PLoS ONE
PLoS ONE 生物-生物学
CiteScore
6.20
自引率
5.40%
发文量
14242
审稿时长
3.7 months
期刊介绍: PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信