Feature selection and classification approaches in gene expression of breast cancer

IF 1.1 Q4 BIOPHYSICS
Sarada Ghosh, Guruprasad Samanta, M. de La Sen
{"title":"Feature selection and classification approaches in gene expression of breast cancer","authors":"Sarada Ghosh, Guruprasad Samanta, M. de La Sen","doi":"10.3934/biophy.2021029","DOIUrl":null,"url":null,"abstract":"DNA microarray technology with biological data-set can monitor the expression levels of thousands of genes simultaneously. Microarray data analysis is important in phenotype classification of diseases. In this work, the computational part basically predicts the tendency towards mortality using different classification techniques by identifying features from the high dimensional dataset. We have analyzed the breast cancer transcriptional genomic data of 1554 transcripts captured over from 272 samples. This work presents effective methods for gene classification using Logistic Regression (LR), Random Forest (RF), Decision Tree (DT) and constructs a classifier with an upgraded rate of accuracy than all features together. The performance of these underlying methods are also compared with dimension reduction method, namely, Principal Component Analysis (PCA). The methods of feature reduction with RF, LR and decision tree (DT) provide better performance than PCA. It is observed that both techniques LR and RF identify TYMP, ERS1, C-MYB and TUBA1a genes. But some features corresponding to the genes such as ARID4B, DNMT3A, TOX3, RGS17 and PNLIP are uniquely pointed out by LR method which are leading to a significant role in breast cancer. The simulation is based on R-software.","PeriodicalId":7529,"journal":{"name":"AIMS Biophysics","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIMS Biophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/biophy.2021029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOPHYSICS","Score":null,"Total":0}
引用次数: 0

Abstract

DNA microarray technology with biological data-set can monitor the expression levels of thousands of genes simultaneously. Microarray data analysis is important in phenotype classification of diseases. In this work, the computational part basically predicts the tendency towards mortality using different classification techniques by identifying features from the high dimensional dataset. We have analyzed the breast cancer transcriptional genomic data of 1554 transcripts captured over from 272 samples. This work presents effective methods for gene classification using Logistic Regression (LR), Random Forest (RF), Decision Tree (DT) and constructs a classifier with an upgraded rate of accuracy than all features together. The performance of these underlying methods are also compared with dimension reduction method, namely, Principal Component Analysis (PCA). The methods of feature reduction with RF, LR and decision tree (DT) provide better performance than PCA. It is observed that both techniques LR and RF identify TYMP, ERS1, C-MYB and TUBA1a genes. But some features corresponding to the genes such as ARID4B, DNMT3A, TOX3, RGS17 and PNLIP are uniquely pointed out by LR method which are leading to a significant role in breast cancer. The simulation is based on R-software.
乳腺癌基因表达的特征选择与分类方法
具有生物数据集的DNA微阵列技术可以同时监测数千个基因的表达水平。微阵列数据分析在疾病表型分类中具有重要意义。在这项工作中,计算部分基本上通过识别高维数据集中的特征,使用不同的分类技术来预测死亡率的趋势。我们分析了从272个样本中捕获的1554个转录本的乳腺癌转录基因组数据。本文提出了使用逻辑回归(LR)、随机森林(RF)、决策树(DT)进行基因分类的有效方法,并构建了一个比所有特征加在一起准确率更高的分类器。这些基础方法的性能也与降维方法,即主成分分析(PCA)进行了比较。基于RF、LR和决策树(DT)的特征约简方法比PCA具有更好的性能。观察到,LR和RF技术都能识别TYMP、ERS1、C-MYB和TUBA1a基因。但一些与ARID4B、DNMT3A、TOX3、RGS17、PNLIP等基因相对应的特征被LR方法独特地指出,在乳腺癌中起重要作用。仿真是基于r软件进行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
AIMS Biophysics
AIMS Biophysics BIOPHYSICS-
CiteScore
2.40
自引率
20.00%
发文量
16
审稿时长
8 weeks
期刊介绍: AIMS Biophysics is an international Open Access journal devoted to publishing peer-reviewed, high quality, original papers in the field of biophysics. We publish the following article types: original research articles, reviews, editorials, letters, and conference reports. AIMS Biophysics welcomes, but not limited to, the papers from the following topics: · Structural biology · Biophysical technology · Bioenergetics · Membrane biophysics · Cellular Biophysics · Electrophysiology · Neuro-Biophysics · Biomechanics · Systems biology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信