A generative deep neural network for pan-digestive tract cancer survival analysis.

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining Pub Date : 2025-01-27 DOI:10.1186/s13040-025-00426-z

Lekai Xu, Tianjun Lan, Yiqian Huang, Liansheng Wang, Junqi Lin, Xinpeng Song, Hui Tang, Haotian Cao, Hua Chai

{"title":"A generative deep neural network for pan-digestive tract cancer survival analysis.","authors":"Lekai Xu, Tianjun Lan, Yiqian Huang, Liansheng Wang, Junqi Lin, Xinpeng Song, Hui Tang, Haotian Cao, Hua Chai","doi":"10.1186/s13040-025-00426-z","DOIUrl":null,"url":null,"abstract":"Background: The accurate identification of molecular subtypes in digestive tract cancer (DTC) is crucial for making informed treatment decisions and selecting potential biomarkers. With the rapid advancement of artificial intelligence, various machine learning algorithms have been successfully applied in this field. However, the complexity and high dimensionality of the data features may lead to overlapping and ambiguous subtypes during clustering.Results: In this study, we propose GDEC, a multi-task generative deep neural network designed for precise digestive tract cancer subtyping. The network optimization process involves employing an integrated loss function consisting of two modules: the generative-adversarial module facilitates spatial data distribution understanding for extracting high-quality information, while the clustering module aids in identifying disease subtypes. The experiments conducted on digestive tract cancer datasets demonstrate that GDEC exhibits exceptional performance compared to other advanced methodologies and can separate different cancer molecular subtypes that possess both statistical and biological significance. Subsequently, 21 hub genes related to pan-DTC heterogeneity and prognosis were identified based on the subtypes clustered by GDEC. The following drug analysis suggested Dasatinib and YM155 as potential therapeutic agents for improving the prognosis of patients in pan-DTC immunotherapy, thereby contributing to the enhancement of cancer patient survival.Conclusions: The experiment indicate that GDEC outperforms better than other deep-learning-based methods, and the interpretable algorithm can select biologically significant genes and potential drugs for DTC treatment.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"9"},"PeriodicalIF":6.1000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771125/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00426-z","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The accurate identification of molecular subtypes in digestive tract cancer (DTC) is crucial for making informed treatment decisions and selecting potential biomarkers. With the rapid advancement of artificial intelligence, various machine learning algorithms have been successfully applied in this field. However, the complexity and high dimensionality of the data features may lead to overlapping and ambiguous subtypes during clustering.

Results: In this study, we propose GDEC, a multi-task generative deep neural network designed for precise digestive tract cancer subtyping. The network optimization process involves employing an integrated loss function consisting of two modules: the generative-adversarial module facilitates spatial data distribution understanding for extracting high-quality information, while the clustering module aids in identifying disease subtypes. The experiments conducted on digestive tract cancer datasets demonstrate that GDEC exhibits exceptional performance compared to other advanced methodologies and can separate different cancer molecular subtypes that possess both statistical and biological significance. Subsequently, 21 hub genes related to pan-DTC heterogeneity and prognosis were identified based on the subtypes clustered by GDEC. The following drug analysis suggested Dasatinib and YM155 as potential therapeutic agents for improving the prognosis of patients in pan-DTC immunotherapy, thereby contributing to the enhancement of cancer patient survival.

Conclusions: The experiment indicate that GDEC outperforms better than other deep-learning-based methods, and the interpretable algorithm can select biologically significant genes and potential drugs for DTC treatment.

查看原文本刊更多论文

泛消化道肿瘤生存分析的生成式深度神经网络。

背景：准确识别消化道癌（DTC）分子亚型对于制定明智的治疗决策和选择潜在的生物标志物至关重要。随着人工智能的飞速发展，各种机器学习算法已成功应用于该领域。然而，数据特征的复杂性和高维性可能导致聚类过程中出现重叠和模糊的子类型。结果：在本研究中，我们提出了一种多任务生成深度神经网络GDEC，用于精确的消化道癌症亚型分型。网络优化过程涉及使用由两个模块组成的集成损失函数：生成对抗模块有助于理解空间数据分布以提取高质量信息，而聚类模块有助于识别疾病亚型。在消化道癌症数据集上进行的实验表明，与其他先进的方法相比，GDEC表现出卓越的性能，可以分离出具有统计学和生物学意义的不同癌症分子亚型。随后，根据GDEC聚类的亚型，确定了21个与泛dtc异质性和预后相关的枢纽基因。以下药物分析表明，达沙替尼和YM155是改善pan-DTC免疫治疗患者预后的潜在治疗药物，有助于提高癌症患者的生存期。结论：实验表明GDEC优于其他基于深度学习的方法，可解释算法可以选择具有生物学意义的基因和潜在的DTC治疗药物。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.