支持向量机(SVM)与随机森林(RF)分类在微阵列癌症检测中的比较分析

Irawansyah, Adiwijaya, W. Astuti
{"title":"支持向量机(SVM)与随机森林(RF)分类在微阵列癌症检测中的比较分析","authors":"Irawansyah, Adiwijaya, W. Astuti","doi":"10.1109/ICoICT52021.2021.9527458","DOIUrl":null,"url":null,"abstract":"Cancer is the second leading cause of death globally. According to the World Health Organization (WHO) in 2018, approximately 9.6 million deaths were caused by cancer. Globally, about 1 in 6 deaths are caused by cancer. One way to detect cancer is to use microarray data classification. Microarray technology is used to detect the expression of thousands of genes at the same time to analyze and diagnose cancer. However, microarray data have high dimensions because of its large features and low data distribution, which means that it has a small data samples, which causes low performance. To overcome this problem, dimension reduction is needed. Therefore, it is necessary to reduce the dimensions of microarray data with Random Projection (RP) to reduce the high dimensions and use the Support Vector Machine (SVM) and Random Forest (RF) as classification methods. The classification method will be compared and analyzed to determine which classification method produces the best performance by using Random Projection (RP) as a dimensional reduction method. Based on the system that has been built, the best accuracy for Colon Tumor is 69.23% with Random Projection (RP)-SVM, Lung Cancer is 100% for both methods classification, Ovarian Cancer is 100% for both methods classification, the prostate tumor is 95.12% for both methods classification and Central Nervous System is 66.66% for both methods classification.","PeriodicalId":191671,"journal":{"name":"2021 9th International Conference on Information and Communication Technology (ICoICT)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Comparative Analysis of Support Vector Machine (SVM) and Random Forest (RF) Classification for Cancer Detection using Microarray\",\"authors\":\"Irawansyah, Adiwijaya, W. Astuti\",\"doi\":\"10.1109/ICoICT52021.2021.9527458\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cancer is the second leading cause of death globally. According to the World Health Organization (WHO) in 2018, approximately 9.6 million deaths were caused by cancer. Globally, about 1 in 6 deaths are caused by cancer. One way to detect cancer is to use microarray data classification. Microarray technology is used to detect the expression of thousands of genes at the same time to analyze and diagnose cancer. However, microarray data have high dimensions because of its large features and low data distribution, which means that it has a small data samples, which causes low performance. To overcome this problem, dimension reduction is needed. Therefore, it is necessary to reduce the dimensions of microarray data with Random Projection (RP) to reduce the high dimensions and use the Support Vector Machine (SVM) and Random Forest (RF) as classification methods. The classification method will be compared and analyzed to determine which classification method produces the best performance by using Random Projection (RP) as a dimensional reduction method. Based on the system that has been built, the best accuracy for Colon Tumor is 69.23% with Random Projection (RP)-SVM, Lung Cancer is 100% for both methods classification, Ovarian Cancer is 100% for both methods classification, the prostate tumor is 95.12% for both methods classification and Central Nervous System is 66.66% for both methods classification.\",\"PeriodicalId\":191671,\"journal\":{\"name\":\"2021 9th International Conference on Information and Communication Technology (ICoICT)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 9th International Conference on Information and Communication Technology (ICoICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICoICT52021.2021.9527458\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 9th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoICT52021.2021.9527458","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

癌症是全球第二大死因。根据世界卫生组织(世卫组织)2018年的数据,全球约有960万人死于癌症。在全球范围内,大约六分之一的死亡是由癌症引起的。检测癌症的一种方法是使用微阵列数据分类。微阵列技术用于同时检测数千个基因的表达,以分析和诊断癌症。然而,微阵列数据由于其大的特征和低的数据分布而具有高维数,这意味着它的数据样本较少,从而导致性能低下。为了克服这个问题,需要减小尺寸。因此,有必要采用随机投影(Random Projection, RP)对微阵列数据进行降维,降低高维,并使用支持向量机(Support Vector Machine, SVM)和随机森林(Random Forest, RF)作为分类方法。将比较和分析分类方法,以确定哪种分类方法使用随机投影(RP)作为降维方法产生最好的性能。在已构建的系统基础上,随机投影-支持向量机对结肠癌的准确率为69.23%,肺癌的准确率为100%,卵巢癌的准确率为100%,前列腺肿瘤的准确率为95.12%,中枢神经系统的准确率为66.66%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparative Analysis of Support Vector Machine (SVM) and Random Forest (RF) Classification for Cancer Detection using Microarray
Cancer is the second leading cause of death globally. According to the World Health Organization (WHO) in 2018, approximately 9.6 million deaths were caused by cancer. Globally, about 1 in 6 deaths are caused by cancer. One way to detect cancer is to use microarray data classification. Microarray technology is used to detect the expression of thousands of genes at the same time to analyze and diagnose cancer. However, microarray data have high dimensions because of its large features and low data distribution, which means that it has a small data samples, which causes low performance. To overcome this problem, dimension reduction is needed. Therefore, it is necessary to reduce the dimensions of microarray data with Random Projection (RP) to reduce the high dimensions and use the Support Vector Machine (SVM) and Random Forest (RF) as classification methods. The classification method will be compared and analyzed to determine which classification method produces the best performance by using Random Projection (RP) as a dimensional reduction method. Based on the system that has been built, the best accuracy for Colon Tumor is 69.23% with Random Projection (RP)-SVM, Lung Cancer is 100% for both methods classification, Ovarian Cancer is 100% for both methods classification, the prostate tumor is 95.12% for both methods classification and Central Nervous System is 66.66% for both methods classification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信