Feature selection for classification based on machine learning algorithms for prostate cancer.

IF 1.3 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Swathypriyadharsini P, Rupashini P R, Premalatha K
{"title":"Feature selection for classification based on machine learning algorithms for prostate cancer.","authors":"Swathypriyadharsini P, Rupashini P R, Premalatha K","doi":"10.1088/2057-1976/adcf2b","DOIUrl":null,"url":null,"abstract":"<p><p>Microarray technology has transformed the biotechnological research to next level in the recent years. It provides the expression levels of various genes involved in a particular disease. Prostate cancer disease turned into life threatening cancer. The genes causing this disease are identified through the classification methods. These gene expression data have problems like high dimensional with low sample size which imposes active challenges in the existing classification algorithms. Feature selection techniques are applied in order to address the dimensionality issues. . This paper aims in analyzing the feature selection methods for classification of gene expression data of Prostate and identify the significant genes that have a major influence on the disease. The three different feature selection methods such as Filters, wrappers and embedded selectors are applied before the classification process for selecting the top ranked genes. Then, the extracted top ranked genes are applied on the classification algorithms such as SVM, k-NN, Random Forest and Artificial Neural Network. After the inclusion of feature selection technique, the classification accuracy is significantly boosted even with less number of genes. Random Forest classification algorithm outperforms other classification methods. The significant genes that has the major influence in prostate cancer disease are identified such as KLK3, GFI1, CXCR2 and TNFRSF10C.</p>","PeriodicalId":8896,"journal":{"name":"Biomedical Physics & Engineering Express","volume":"11 3","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Physics & Engineering Express","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2057-1976/adcf2b","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Microarray technology has transformed the biotechnological research to next level in the recent years. It provides the expression levels of various genes involved in a particular disease. Prostate cancer disease turned into life threatening cancer. The genes causing this disease are identified through the classification methods. These gene expression data have problems like high dimensional with low sample size which imposes active challenges in the existing classification algorithms. Feature selection techniques are applied in order to address the dimensionality issues. . This paper aims in analyzing the feature selection methods for classification of gene expression data of Prostate and identify the significant genes that have a major influence on the disease. The three different feature selection methods such as Filters, wrappers and embedded selectors are applied before the classification process for selecting the top ranked genes. Then, the extracted top ranked genes are applied on the classification algorithms such as SVM, k-NN, Random Forest and Artificial Neural Network. After the inclusion of feature selection technique, the classification accuracy is significantly boosted even with less number of genes. Random Forest classification algorithm outperforms other classification methods. The significant genes that has the major influence in prostate cancer disease are identified such as KLK3, GFI1, CXCR2 and TNFRSF10C.

基于机器学习算法的前列腺癌分类特征选择。
近年来,微阵列技术将生物技术研究推向了新的高度。它提供了与特定疾病有关的各种基因的表达水平。前列腺癌变成了危及生命的癌症。通过分类方法确定了引起该病的基因。这些基因表达数据存在高维、小样本量等问题,对现有的分类算法提出了积极的挑战。特征选择技术的应用是为了解决维数问题。本文旨在分析前列腺基因表达数据分类的特征选择方法,找出对该疾病有重大影响的显著基因。在分类过程之前,使用过滤器、包装器和嵌入选择器三种不同的特征选择方法来选择排名最高的基因。然后,将提取的排名靠前的基因应用于SVM、k-NN、Random Forest和Artificial Neural Network等分类算法。加入特征选择技术后,即使基因数量较少,分类准确率也显著提高。随机森林分类算法优于其他分类方法。鉴定出在前列腺癌疾病中具有重要影响的重要基因如KLK3、GFI1、CXCR2和TNFRSF10C。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Biomedical Physics & Engineering Express
Biomedical Physics & Engineering Express RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
CiteScore
2.80
自引率
0.00%
发文量
153
期刊介绍: BPEX is an inclusive, international, multidisciplinary journal devoted to publishing new research on any application of physics and/or engineering in medicine and/or biology. Characterized by a broad geographical coverage and a fast-track peer-review process, relevant topics include all aspects of biophysics, medical physics and biomedical engineering. Papers that are almost entirely clinical or biological in their focus are not suitable. The journal has an emphasis on publishing interdisciplinary work and bringing research fields together, encompassing experimental, theoretical and computational work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信