Machine learning algorithm and deep neural networks identified a novel subtype in hepatocellular carcinoma.

IF 1.9
Quan Zi, Hanwei Cui, Wei Liang, Qingjia Chi
{"title":"Machine learning algorithm and deep neural networks identified a novel subtype in hepatocellular carcinoma.","authors":"Quan Zi,&nbsp;Hanwei Cui,&nbsp;Wei Liang,&nbsp;Qingjia Chi","doi":"10.3233/CBM-220147","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Hepatocellular carcinoma (HCC) is one of the most common malignant tumors. Due to the lack of specific characteristics in the early stage of the disease, patients are usually diagnosed in the advanced stage of disease progression.</p><p><strong>Objective: </strong>This study used machine learning algorithms to identify key genes in the progression of hepatocellular carcinoma and constructed a prediction model to predict the survival risk of HCC patients.</p><p><strong>Methods: </strong>The transcriptome data and clinical information were downloaded from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO). The differential expression analysis and COX proportional-hazards model participated in the identification of survival-related genes. K-Means, Random forests, and LASSO regression are involved in identifying novel subtypes of HCC and screening key genes. The prediction model was constructed by deep neural networks (DNN), and Gene Set Enrichment Analysis (GSEA) reveals the metabolic pathways where key genes are located.</p><p><strong>Results: </strong>Two subtypes were identified with significantly different survival rates (p< 0.0001, AUC = 0.720) and 17 key genes associated with the subtypes. The accuracy rate of the deep neural network prediction model is greater than 93.3%. The GSEA analysis found that the survival-related genes were significantly enriched in hallmark gene sets in the MSigDB database.</p><p><strong>Conclusions: </strong>In this study, we used machine learning algorithms to screen out 17 genes related to the survival risk of HCC patients, and trained a DNN model based on them to predict the survival risk of HCC patients. The genes that make up the model are all key genes that affect the formation and development of cancer.</p>","PeriodicalId":520578,"journal":{"name":"Cancer biomarkers : section A of Disease markers","volume":" ","pages":"305-320"},"PeriodicalIF":1.9000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer biomarkers : section A of Disease markers","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3233/CBM-220147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Background: Hepatocellular carcinoma (HCC) is one of the most common malignant tumors. Due to the lack of specific characteristics in the early stage of the disease, patients are usually diagnosed in the advanced stage of disease progression.

Objective: This study used machine learning algorithms to identify key genes in the progression of hepatocellular carcinoma and constructed a prediction model to predict the survival risk of HCC patients.

Methods: The transcriptome data and clinical information were downloaded from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO). The differential expression analysis and COX proportional-hazards model participated in the identification of survival-related genes. K-Means, Random forests, and LASSO regression are involved in identifying novel subtypes of HCC and screening key genes. The prediction model was constructed by deep neural networks (DNN), and Gene Set Enrichment Analysis (GSEA) reveals the metabolic pathways where key genes are located.

Results: Two subtypes were identified with significantly different survival rates (p< 0.0001, AUC = 0.720) and 17 key genes associated with the subtypes. The accuracy rate of the deep neural network prediction model is greater than 93.3%. The GSEA analysis found that the survival-related genes were significantly enriched in hallmark gene sets in the MSigDB database.

Conclusions: In this study, we used machine learning algorithms to screen out 17 genes related to the survival risk of HCC patients, and trained a DNN model based on them to predict the survival risk of HCC patients. The genes that make up the model are all key genes that affect the formation and development of cancer.

机器学习算法和深度神经网络在肝细胞癌中发现了一种新的亚型。
背景:肝细胞癌(HCC)是最常见的恶性肿瘤之一。由于疾病早期缺乏特异性特征,患者通常在疾病进展晚期才被诊断出来。目的:本研究利用机器学习算法识别肝细胞癌进展中的关键基因,构建肝癌患者生存风险预测模型。方法:从Cancer Genome Atlas (TCGA)和Gene Expression Omnibus (GEO)下载转录组数据和临床资料。差异表达分析和COX比例风险模型参与了生存相关基因的鉴定。k -均值、随机森林和LASSO回归可用于鉴定新的HCC亚型和筛选关键基因。通过深度神经网络(DNN)构建预测模型,基因集富集分析(GSEA)揭示了关键基因所在的代谢途径。结果:鉴定出两种亚型生存率差异显著(p< 0.0001, AUC = 0.720), 17个关键基因与亚型相关。深度神经网络预测模型的准确率大于93.3%。GSEA分析发现,生存相关基因在MSigDB数据库的标记基因集中显著富集。结论:在本研究中,我们使用机器学习算法筛选出17个与HCC患者生存风险相关的基因,并在此基础上训练DNN模型来预测HCC患者的生存风险。组成这个模型的基因都是影响癌症形成和发展的关键基因。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信