Hierarchical learning of gastric cancer molecular subtypes by integrating multi‐modal DNA‐level omics data and clinical stratification

Binyu Yang, Siying Liu, Jiemin Xie, Xi Tang, Pan Guan, Yifan Zhu, Xuemei Liu, Yunhui Xiong, Zuli Yang, Weiyao Li, Yonghua Wang, Wen Chen, Qingjiao Li, Li C. Xia
{"title":"Hierarchical learning of gastric cancer molecular subtypes by integrating multi‐modal DNA‐level omics data and clinical stratification","authors":"Binyu Yang, Siying Liu, Jiemin Xie, Xi Tang, Pan Guan, Yifan Zhu, Xuemei Liu, Yunhui Xiong, Zuli Yang, Weiyao Li, Yonghua Wang, Wen Chen, Qingjiao Li, Li C. Xia","doi":"10.1002/qub2.45","DOIUrl":null,"url":null,"abstract":"Molecular subtyping of gastric cancer (GC) aims to comprehend its genetic landscape. However, the efficacy of current subtyping methods is hampered by their mixed use of molecular features, a lack of strategy optimization, and the limited availability of public GC datasets. There is a pressing need for a precise and easily adoptable subtyping approach for early DNA‐based screening and treatment. Based on TCGA subtypes, we developed a novel DNA‐based hierarchical classifier for gastric cancer molecular subtyping (HCG), which employs gene mutations, copy number aberrations, and methylation patterns as predictors. By incorporating the closely related esophageal adenocarcinomas dataset, we expanded the TCGA GC dataset for the training and testing of HCG (n = 453). The optimization of HCG was achieved through three hierarchical strategies using Lasso‐Logistic regression, evaluated by their overall the area under receiver operating characteristic curve (auROC), accuracy, F1 score, the area under precision‐recall curve (auPRC) and their capability for clinical stratification using multivariate survival analysis. Subtype‐specific DNA alteration biomarkers were discerned through difference tests based on HCG defined subtypes. Our HCG classifier demonstrated superior performance in terms of overall auROC (0.95), accuracy (0.88), F1 score (0.87) and auPRC (0.86), significantly improving the clinical stratification of patients (overall p‐value = 0.032). Difference tests identified 25 subtype‐specific DNA alterations, including a high mutation rate in the SYNE1, ITGB4, and COL22A1 genes for the MSI subtype, and hypermethylation of ALS2CL, KIAA0406, and RPRD1B genes for the EBV subtype. HCG is an accurate and robust classifier for DNA‐based GC molecular subtyping with highly predictive clinical stratification performance. The training and test datasets, along with the analysis programs of HCG, are accessible on the GitHub website (github.com/LabxSCUT).","PeriodicalId":508846,"journal":{"name":"Quantitative Biology","volume":"77 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/qub2.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Molecular subtyping of gastric cancer (GC) aims to comprehend its genetic landscape. However, the efficacy of current subtyping methods is hampered by their mixed use of molecular features, a lack of strategy optimization, and the limited availability of public GC datasets. There is a pressing need for a precise and easily adoptable subtyping approach for early DNA‐based screening and treatment. Based on TCGA subtypes, we developed a novel DNA‐based hierarchical classifier for gastric cancer molecular subtyping (HCG), which employs gene mutations, copy number aberrations, and methylation patterns as predictors. By incorporating the closely related esophageal adenocarcinomas dataset, we expanded the TCGA GC dataset for the training and testing of HCG (n = 453). The optimization of HCG was achieved through three hierarchical strategies using Lasso‐Logistic regression, evaluated by their overall the area under receiver operating characteristic curve (auROC), accuracy, F1 score, the area under precision‐recall curve (auPRC) and their capability for clinical stratification using multivariate survival analysis. Subtype‐specific DNA alteration biomarkers were discerned through difference tests based on HCG defined subtypes. Our HCG classifier demonstrated superior performance in terms of overall auROC (0.95), accuracy (0.88), F1 score (0.87) and auPRC (0.86), significantly improving the clinical stratification of patients (overall p‐value = 0.032). Difference tests identified 25 subtype‐specific DNA alterations, including a high mutation rate in the SYNE1, ITGB4, and COL22A1 genes for the MSI subtype, and hypermethylation of ALS2CL, KIAA0406, and RPRD1B genes for the EBV subtype. HCG is an accurate and robust classifier for DNA‐based GC molecular subtyping with highly predictive clinical stratification performance. The training and test datasets, along with the analysis programs of HCG, are accessible on the GitHub website (github.com/LabxSCUT).
通过整合多模态 DNA 级全息数据和临床分层,对胃癌分子亚型进行分层学习
胃癌(GC)的分子亚型分析旨在了解其基因状况。然而,目前的亚型鉴定方法因其对分子特征的混合使用、缺乏策略优化以及公共胃癌数据集的可用性有限而影响了其效果。目前迫切需要一种精确且易于采用的亚型鉴定方法,用于基于 DNA 的早期筛查和治疗。在 TCGA 亚型的基础上,我们开发了一种新的基于 DNA 的胃癌分子亚型分层分类器(HCG),它采用基因突变、拷贝数畸变和甲基化模式作为预测因子。通过纳入密切相关的食管腺癌数据集,我们扩展了用于训练和测试 HCG 的 TCGA 胃癌数据集(n = 453)。通过使用Lasso-Logistic回归的三种分层策略实现了HCG的优化,并通过接收者操作特征曲线下面积(auROC)、准确率、F1评分、精确度-召回曲线下面积(auPRC)以及使用多变量生存分析进行临床分层的能力对其进行了评估。亚型特异性DNA改变生物标记物是根据HCG定义的亚型通过差异检验确定的。我们的HCG分类器在总体auROC(0.95)、准确率(0.88)、F1得分(0.87)和auPRC(0.86)方面表现优异,显著改善了患者的临床分层(总体p值=0.032)。差异检验确定了 25 种亚型特异性 DNA 改变,包括 MSI 亚型中 SYNE1、ITGB4 和 COL22A1 基因的高突变率,以及 EBV 亚型中 ALS2CL、KIAA0406 和 RPRD1B 基因的高甲基化。HCG是一种基于DNA的GC分子亚型准确而稳健的分类器,具有高度的临床分层预测性能。HCG的训练和测试数据集以及分析程序可在GitHub网站(github.com/LabxSCUT)上访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信