利用芯片 RNA 表达数据对乳腺癌亚型进行分类

Muhammad Shazwan Suhiman, Sayang Mohd Deni, Ahmad Zia Ul-Saufie Mohamad Japeri, Aszila Asmat, Lirong Wang
{"title":"利用芯片 RNA 表达数据对乳腺癌亚型进行分类","authors":"Muhammad Shazwan Suhiman, Sayang Mohd Deni, Ahmad Zia Ul-Saufie Mohamad Japeri, Aszila Asmat, Lirong Wang","doi":"10.37934/araset.46.1.7585","DOIUrl":null,"url":null,"abstract":"Breast cancer is a heterogeneous disease that involves molecular alteration, cellular alterations, and clinical outcome for which the classification of Breast cancer remains a challenge to diagnose. Current practice uses immunohistochemistry markers and clinical variables to classify Breast cancer, but this approach has limitations due to the inclusion of other tumour subtypes and healthy individuals. Machine learning approaches based on mRNA expression data offer new possibilities for researchers to investigate the potential of molecular biomarkers as one of the diagnostic characteristics. The purpose of this study is to evaluate features (genes) rank through feature selection method for Breast cancer diagnostic test. Three feature selection methods of IG, relief and mRMR were applied and subsets of top 100, 50, 25, 10, 5 and 3 were created. Each subset was tested with SVM, LR and RF classifiers and its performance was assessed using confusion matrix. The result of this study found that the feature selection of IG, reliefF and mRMR was able to achieve highest accuracy with SVM, LR and RF classifier. mRMR with RF classifier achieved highest accuracy with the least number of top rank genes with 25 genes. Hybrid feature selection approached (mRMR + SVM) improved accuracy of top 3 highest rank genes using SVM, LR and RF classifier. Future work should aim to use other feature selection methods and classifiers to explore the classification accuracy with the least features subset in multiclass cancer dataset.","PeriodicalId":506443,"journal":{"name":"Journal of Advanced Research in Applied Sciences and Engineering Technology","volume":"12 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classification of Breast Cancer Subtypes using Microarray RNA Expression Data\",\"authors\":\"Muhammad Shazwan Suhiman, Sayang Mohd Deni, Ahmad Zia Ul-Saufie Mohamad Japeri, Aszila Asmat, Lirong Wang\",\"doi\":\"10.37934/araset.46.1.7585\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Breast cancer is a heterogeneous disease that involves molecular alteration, cellular alterations, and clinical outcome for which the classification of Breast cancer remains a challenge to diagnose. Current practice uses immunohistochemistry markers and clinical variables to classify Breast cancer, but this approach has limitations due to the inclusion of other tumour subtypes and healthy individuals. Machine learning approaches based on mRNA expression data offer new possibilities for researchers to investigate the potential of molecular biomarkers as one of the diagnostic characteristics. The purpose of this study is to evaluate features (genes) rank through feature selection method for Breast cancer diagnostic test. Three feature selection methods of IG, relief and mRMR were applied and subsets of top 100, 50, 25, 10, 5 and 3 were created. Each subset was tested with SVM, LR and RF classifiers and its performance was assessed using confusion matrix. The result of this study found that the feature selection of IG, reliefF and mRMR was able to achieve highest accuracy with SVM, LR and RF classifier. mRMR with RF classifier achieved highest accuracy with the least number of top rank genes with 25 genes. Hybrid feature selection approached (mRMR + SVM) improved accuracy of top 3 highest rank genes using SVM, LR and RF classifier. Future work should aim to use other feature selection methods and classifiers to explore the classification accuracy with the least features subset in multiclass cancer dataset.\",\"PeriodicalId\":506443,\"journal\":{\"name\":\"Journal of Advanced Research in Applied Sciences and Engineering Technology\",\"volume\":\"12 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Advanced Research in Applied Sciences and Engineering Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.37934/araset.46.1.7585\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advanced Research in Applied Sciences and Engineering Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37934/araset.46.1.7585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

乳腺癌是一种异质性疾病,涉及分子改变、细胞改变和临床结果,因此乳腺癌的分类仍然是诊断中的一项挑战。目前的做法是利用免疫组化标记物和临床变量对乳腺癌进行分类,但这种方法由于包含了其他肿瘤亚型和健康个体而存在局限性。基于 mRNA 表达数据的机器学习方法为研究人员研究分子生物标志物作为诊断特征之一的潜力提供了新的可能性。本研究的目的是通过特征选择方法评估乳腺癌诊断测试的特征(基因)等级。研究应用了 IG、浮雕和 mRMR 三种特征选择方法,并创建了前 100、50、25、10、5 和 3 个子集。每个子集都用 SVM、LR 和 RF 分类器进行了测试,并用混淆矩阵评估了其性能。研究结果发现,在 SVM、LR 和 RF 分类器中,IG、f reliefF 和 mRMR 的特征选择能够达到最高的准确率。混合特征选择方法(mRMR + SVM)提高了使用 SVM、LR 和 RF 分类器的前 3 个最高等级基因的准确率。未来的工作应着眼于使用其他特征选择方法和分类器,以探索在多类癌症数据集中使用最少特征子集的分类准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Classification of Breast Cancer Subtypes using Microarray RNA Expression Data
Breast cancer is a heterogeneous disease that involves molecular alteration, cellular alterations, and clinical outcome for which the classification of Breast cancer remains a challenge to diagnose. Current practice uses immunohistochemistry markers and clinical variables to classify Breast cancer, but this approach has limitations due to the inclusion of other tumour subtypes and healthy individuals. Machine learning approaches based on mRNA expression data offer new possibilities for researchers to investigate the potential of molecular biomarkers as one of the diagnostic characteristics. The purpose of this study is to evaluate features (genes) rank through feature selection method for Breast cancer diagnostic test. Three feature selection methods of IG, relief and mRMR were applied and subsets of top 100, 50, 25, 10, 5 and 3 were created. Each subset was tested with SVM, LR and RF classifiers and its performance was assessed using confusion matrix. The result of this study found that the feature selection of IG, reliefF and mRMR was able to achieve highest accuracy with SVM, LR and RF classifier. mRMR with RF classifier achieved highest accuracy with the least number of top rank genes with 25 genes. Hybrid feature selection approached (mRMR + SVM) improved accuracy of top 3 highest rank genes using SVM, LR and RF classifier. Future work should aim to use other feature selection methods and classifiers to explore the classification accuracy with the least features subset in multiclass cancer dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.30
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信