Predicting factors and top gene identification for survival data of breast cancer

IF 1.1 Q4 BIOPHYSICS
Sarada Ghosh, Guruprasad Samanta, Manuel De la Sen
{"title":"Predicting factors and top gene identification for survival data of breast cancer","authors":"Sarada Ghosh, Guruprasad Samanta, Manuel De la Sen","doi":"10.3934/biophy.2023006","DOIUrl":null,"url":null,"abstract":"For high-throughput research with biological data-sets generated sequentially or by transcriptional micro-arrays, proteomics or other means, analytic techniques that address their high dimensional aspects remain desirable. The computation part basically predicts the tendency towards mortality due to breast cancer (BC) by using several classification methods, i.e., Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA) and Decision Tree (DT), and compared the models' performances. We proceed with the RF method since it provides better results than any other underlying models based on accuracy. We have also demonstrated some traditional and competing risk models, illustrated the models with real data analysis, depicted their curves' natures and also compared their fits using prediction error curves and the concordance index. Furthermore, two different survival splitting rules are used by using separate Random Survival Forest (RSF) methods and also constructing the ranking of risk factors due to breast cancer. The results show that high-level grade and diameter are the most important predictors for mortality progression in the presence of competing events of death, and lymph nodes, age and angiography are other vital criteria for this purpose. We have also implemented RSF backward selection criteria, which enables top gene selection related to mortality progression due to breast cancer. This method identifies c-MYB, CDCA7, NUSAP1, BIRC5, ANGPTL4, JAG1, IL6ST, and remaining genes that are mainly responsible for mortality progression due to breast cancer. In this work, R software is used to obtain and evaluate the results.","PeriodicalId":7529,"journal":{"name":"AIMS Biophysics","volume":"1 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIMS Biophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/biophy.2023006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOPHYSICS","Score":null,"Total":0}
引用次数: 0

Abstract

For high-throughput research with biological data-sets generated sequentially or by transcriptional micro-arrays, proteomics or other means, analytic techniques that address their high dimensional aspects remain desirable. The computation part basically predicts the tendency towards mortality due to breast cancer (BC) by using several classification methods, i.e., Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA) and Decision Tree (DT), and compared the models' performances. We proceed with the RF method since it provides better results than any other underlying models based on accuracy. We have also demonstrated some traditional and competing risk models, illustrated the models with real data analysis, depicted their curves' natures and also compared their fits using prediction error curves and the concordance index. Furthermore, two different survival splitting rules are used by using separate Random Survival Forest (RSF) methods and also constructing the ranking of risk factors due to breast cancer. The results show that high-level grade and diameter are the most important predictors for mortality progression in the presence of competing events of death, and lymph nodes, age and angiography are other vital criteria for this purpose. We have also implemented RSF backward selection criteria, which enables top gene selection related to mortality progression due to breast cancer. This method identifies c-MYB, CDCA7, NUSAP1, BIRC5, ANGPTL4, JAG1, IL6ST, and remaining genes that are mainly responsible for mortality progression due to breast cancer. In this work, R software is used to obtain and evaluate the results.
乳腺癌生存数据的预测因素及顶级基因鉴定
对于使用顺序生成或通过转录微阵列、蛋白质组学或其他手段生成的生物数据集进行高通量研究,解决其高维方面的分析技术仍然是可取的。计算部分采用Logistic回归(LR)、随机森林(RF)、支持向量机(SVM)、线性判别分析(LDA)和决策树(DT)等几种分类方法,对乳腺癌(BC)死亡率趋势进行基本预测,并对模型性能进行比较。我们继续使用RF方法,因为它提供了比任何其他基于准确性的底层模型更好的结果。我们还展示了一些传统的和竞争的风险模型,用实际数据分析说明了这些模型,描述了它们的曲线性质,并使用预测误差曲线和一致性指数比较了它们的拟合。此外,通过使用不同的随机生存森林(RSF)方法,并构建乳腺癌危险因素排序,采用了两种不同的生存分裂规则。结果表明,在存在竞争性死亡事件的情况下,高级别和直径是死亡率进展的最重要预测因素,淋巴结、年龄和血管造影是其他重要标准。我们还实施了RSF向后选择标准,使与乳腺癌死亡率进展相关的顶级基因选择成为可能。该方法鉴定了c-MYB、CDCA7、NUSAP1、BIRC5、ANGPTL4、JAG1、IL6ST以及其他主要负责乳腺癌死亡率进展的基因。在这项工作中,使用R软件来获取和评估结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
AIMS Biophysics
AIMS Biophysics BIOPHYSICS-
CiteScore
2.40
自引率
20.00%
发文量
16
审稿时长
8 weeks
期刊介绍: AIMS Biophysics is an international Open Access journal devoted to publishing peer-reviewed, high quality, original papers in the field of biophysics. We publish the following article types: original research articles, reviews, editorials, letters, and conference reports. AIMS Biophysics welcomes, but not limited to, the papers from the following topics: · Structural biology · Biophysical technology · Bioenergetics · Membrane biophysics · Cellular Biophysics · Electrophysiology · Neuro-Biophysics · Biomechanics · Systems biology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信