Deep learning model with L1 penalty for predicting breast cancer metastasis using gene expression data

IF 6.3 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jaeyoon Kim, Minhyeok Lee, Junhee Seok
{"title":"Deep learning model with L1 penalty for predicting breast cancer metastasis using gene expression data","authors":"Jaeyoon Kim, Minhyeok Lee, Junhee Seok","doi":"10.1088/2632-2153/acd987","DOIUrl":null,"url":null,"abstract":"Breast cancer has the highest incidence and death rate among women; moreover, its metastasis to other organs increases the mortality rate. Since several studies have reported gene expression and cancer prognosis to be related, the study of breast cancer metastasis using gene expression is crucial. To this end, a novel deep neural network architecture, deep learning-based cancer metastasis estimator (DeepCME), is proposed in this paper for predicting breast cancer metastasis. However, the problem of overfitting occurs frequently while training deep learning models using gene expression data because they contain a large number of genes and the sample size is rather small. To address overfitting, several regularization methods are implemented, such as L1 penalty, batch normalization, and dropout. To demonstrate the superior performance of our model, area under curve (AUC) scores are evaluated and then compared with five baseline models: logistic regression, support vector classifier (SVC), random forest, decision tree, and k-nearest neighbor. Considering results, DeepCME demonstrates the highest average AUC scores in most cross-validation cases, and the average AUC score of DeepCME is 0.754, which is approximately 12.9% higher than SVC, the second-best model. In addition, the 30 most significant genes related to breast cancer metastasis are identified based on DeepCME results and some are discussed in further detail considering the reports from some previous medical studies. Considering the high expense involved in measuring the expression of a single gene, the ability to develop the cost-effective and time-efficient tests using only a few key genes is valuable. Based on this study, we expect DeepCME to be utilized clinically for predicting breast cancer metastasis and be applied to other types of cancer as well after further research.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/2632-2153/acd987","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 2

Abstract

Breast cancer has the highest incidence and death rate among women; moreover, its metastasis to other organs increases the mortality rate. Since several studies have reported gene expression and cancer prognosis to be related, the study of breast cancer metastasis using gene expression is crucial. To this end, a novel deep neural network architecture, deep learning-based cancer metastasis estimator (DeepCME), is proposed in this paper for predicting breast cancer metastasis. However, the problem of overfitting occurs frequently while training deep learning models using gene expression data because they contain a large number of genes and the sample size is rather small. To address overfitting, several regularization methods are implemented, such as L1 penalty, batch normalization, and dropout. To demonstrate the superior performance of our model, area under curve (AUC) scores are evaluated and then compared with five baseline models: logistic regression, support vector classifier (SVC), random forest, decision tree, and k-nearest neighbor. Considering results, DeepCME demonstrates the highest average AUC scores in most cross-validation cases, and the average AUC score of DeepCME is 0.754, which is approximately 12.9% higher than SVC, the second-best model. In addition, the 30 most significant genes related to breast cancer metastasis are identified based on DeepCME results and some are discussed in further detail considering the reports from some previous medical studies. Considering the high expense involved in measuring the expression of a single gene, the ability to develop the cost-effective and time-efficient tests using only a few key genes is valuable. Based on this study, we expect DeepCME to be utilized clinically for predicting breast cancer metastasis and be applied to other types of cancer as well after further research.
利用基因表达数据预测癌症转移的L1惩罚深度学习模型
癌症在女性中发病率和死亡率最高;此外,它转移到其他器官会增加死亡率。由于一些研究报道了基因表达与癌症预后相关,因此利用基因表达研究癌症转移至关重要。为此,本文提出了一种新的深度神经网络结构——基于深度学习的癌症转移估计器(DeepCME),用于预测癌症转移。然而,在使用基因表达数据训练深度学习模型时,过拟合问题经常发生,因为它们包含大量基因,并且样本量相当小。为了解决过拟合问题,实现了几种正则化方法,如L1惩罚、批量归一化和丢弃。为了证明我们模型的优越性能,评估了曲线下面积(AUC)得分,然后将其与五个基线模型进行比较:逻辑回归、支持向量分类器(SVC)、随机森林、决策树和k近邻。从结果来看,DeepCME在大多数交叉验证案例中的平均AUC得分最高,DeepCME的平均AUC得分为0.754,比第二好模型SVC高出约12.9%。此外,根据DeepCME的结果,确定了与癌症转移相关的30个最重要的基因,并考虑到之前一些医学研究的报告,对其中一些基因进行了进一步详细的讨论。考虑到测量单个基因表达所涉及的高昂费用,仅使用少数关键基因开发成本效益高且时效性强的测试的能力是有价值的。基于这项研究,我们期望DeepCME在临床上用于预测癌症转移,并在进一步研究后应用于其他类型的癌症。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Machine Learning Science and Technology
Machine Learning Science and Technology Computer Science-Artificial Intelligence
CiteScore
9.10
自引率
4.40%
发文量
86
审稿时长
5 weeks
期刊介绍: Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信