Interpretable Transfer Learning for Cancer Drug Resistance: Candidate Target Identification.

IF 3 3区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY
Wenjie Zhang, Xisong Wu, Liang Chen, Xinyue Wan
{"title":"Interpretable Transfer Learning for Cancer Drug Resistance: Candidate Target Identification.","authors":"Wenjie Zhang, Xisong Wu, Liang Chen, Xinyue Wan","doi":"10.3390/cimb47090753","DOIUrl":null,"url":null,"abstract":"<p><p>Tumor drug resistance exhibits substantial heterogeneity across cancer types, reflecting distinct molecular mechanisms in each malignancy. To characterize this complexity, we developed a pan-cancer transfer learning framework that integrates bulk RNA-seq data with a residual variational autoencoder (Res VAE) backbone. Five models were trained on the Genomics of Drug Sensitivity in Cancer (GDSC) dataset, which includes drug response profiles for 72 chemotherapeutic agents. Among them, three models are specially designed by incorporating variational autoencoders and large pretrained models (LLMs): the LLM large VAE (VAE_LL), the LLM small VAE (VAE_LS), and the LLM distillation VAE (VAE_LD). Random Forest (RF) and eXtreme Gradient Boosting (XGB) were included as ensemble learning baselines. After internal cross-validation, the top four models (VAE_LL, VAE_LD, XGB, and RF) were applied to five representative TCGA cohorts comprising 1,836 patients. For each cancer type, resistance to nine clinically relevant first-line drugs was modeled, resulting in 180 drug-cancer prediction tasks. Among all models, VAE_LD achieved the best overall performance, with a mean AUC of 0.81 and an F1 score of 0.92 on the GDSC benchmark, and maintained strong predictive power in the clinical validation phase. Interpretation analyses identified tumor-specific resistance biomarkers with clinical significance. In lung adenocarcinoma, elevated expression of <i>TFF1</i> was repeatedly associated with resistance to Gefitinib and correlated with poor patient prognosis, indicating its potential as a therapeutic target. In glioblastoma, <i>OPALIN</i>, <i>LTF</i>, <i>IL2RA</i>, and <i>SLC17A7</i> were implicated in Temozolomide resistance through pathways related to epithelial differentiation and angiogenesis. In conclusion, the VAE_LD model offers a high-performing and interpretable approach for predicting drug resistance across multiple tumor types. It supports the identification of clinically actionable biomarkers and provides a robust framework for precision oncology applications.</p>","PeriodicalId":10839,"journal":{"name":"Current Issues in Molecular Biology","volume":"47 9","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468400/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Issues in Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/cimb47090753","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Tumor drug resistance exhibits substantial heterogeneity across cancer types, reflecting distinct molecular mechanisms in each malignancy. To characterize this complexity, we developed a pan-cancer transfer learning framework that integrates bulk RNA-seq data with a residual variational autoencoder (Res VAE) backbone. Five models were trained on the Genomics of Drug Sensitivity in Cancer (GDSC) dataset, which includes drug response profiles for 72 chemotherapeutic agents. Among them, three models are specially designed by incorporating variational autoencoders and large pretrained models (LLMs): the LLM large VAE (VAE_LL), the LLM small VAE (VAE_LS), and the LLM distillation VAE (VAE_LD). Random Forest (RF) and eXtreme Gradient Boosting (XGB) were included as ensemble learning baselines. After internal cross-validation, the top four models (VAE_LL, VAE_LD, XGB, and RF) were applied to five representative TCGA cohorts comprising 1,836 patients. For each cancer type, resistance to nine clinically relevant first-line drugs was modeled, resulting in 180 drug-cancer prediction tasks. Among all models, VAE_LD achieved the best overall performance, with a mean AUC of 0.81 and an F1 score of 0.92 on the GDSC benchmark, and maintained strong predictive power in the clinical validation phase. Interpretation analyses identified tumor-specific resistance biomarkers with clinical significance. In lung adenocarcinoma, elevated expression of TFF1 was repeatedly associated with resistance to Gefitinib and correlated with poor patient prognosis, indicating its potential as a therapeutic target. In glioblastoma, OPALIN, LTF, IL2RA, and SLC17A7 were implicated in Temozolomide resistance through pathways related to epithelial differentiation and angiogenesis. In conclusion, the VAE_LD model offers a high-performing and interpretable approach for predicting drug resistance across multiple tumor types. It supports the identification of clinically actionable biomarkers and provides a robust framework for precision oncology applications.

Abstract Image

Abstract Image

Abstract Image

癌症耐药的可解释迁移学习:候选靶点识别。
肿瘤耐药在不同类型的癌症中表现出实质性的异质性,反映了每种恶性肿瘤中不同的分子机制。为了表征这种复杂性,我们开发了一个泛癌症转移学习框架,该框架将大量RNA-seq数据与残差变分自编码器(Res VAE)主干集成在一起。五个模型在癌症药物敏感性基因组学(GDSC)数据集上进行了训练,其中包括72种化疗药物的药物反应概况。其中,结合变分自编码器和大型预训练模型(LLM)专门设计了三个模型:LLM大VAE (VAE_LL)、LLM小VAE (VAE_LS)和LLM蒸馏VAE (VAE_LD)。随机森林(RF)和极端梯度增强(XGB)作为集成学习基线。经过内部交叉验证,将4个最佳模型(VAE_LL、VAE_LD、XGB和RF)应用于5个具有代表性的TCGA队列,共1836例患者。对于每种癌症类型,对九种临床相关一线药物的耐药性进行建模,产生180种药物-癌症预测任务。在所有模型中,VAE_LD的综合性能最好,平均AUC为0.81,GDSC基准F1得分为0.92,在临床验证阶段保持较强的预测能力。解释分析确定了具有临床意义的肿瘤特异性耐药生物标志物。在肺腺癌中,TFF1表达升高与吉非替尼耐药反复相关,并与患者预后不良相关,提示其作为治疗靶点的潜力。在胶质母细胞瘤中,OPALIN、LTF、IL2RA和SLC17A7通过与上皮分化和血管生成相关的途径参与替莫唑胺耐药。总之,VAE_LD模型为预测多种肿瘤类型的耐药提供了一种高效且可解释的方法。它支持临床可操作的生物标志物的鉴定,并为精确肿瘤学应用提供了一个强大的框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Current Issues in Molecular Biology
Current Issues in Molecular Biology 生物-生化研究方法
CiteScore
2.90
自引率
3.20%
发文量
380
审稿时长
>12 weeks
期刊介绍: Current Issues in Molecular Biology (CIMB) is a peer-reviewed journal publishing review articles and minireviews in all areas of molecular biology and microbiology. Submitted articles are subject to an Article Processing Charge (APC) and are open access immediately upon publication. All manuscripts undergo a peer-review process.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信