Feature selection algorithm using information gain based clustering for supporting the treatment process of breast cancer

Tresna Maulana Fahrudin, I. Syarif, Ali Ridho Barakbah
{"title":"Feature selection algorithm using information gain based clustering for supporting the treatment process of breast cancer","authors":"Tresna Maulana Fahrudin, I. Syarif, Ali Ridho Barakbah","doi":"10.1109/IAC.2016.7905680","DOIUrl":null,"url":null,"abstract":"Breast cancer is the second highest cancer type that attacked Indonesia women. The high breast cancer patients in Indonesia also have an impact on their life expectancy to recover by treatment routinely. Malignancies and death probability are some factor of many determinants of breast cancer patient's recovery. This research examines the determinant factor of breast cancer patient treatment based on the latest condition. The dataset was originally taken from one of oncology hospital in East Java, Indonesia, which is consist of 1907 samples, 18 attributes and 2 classes. We used information gain as feature selection technique by using the entropy formula to select the best attributes that have great contribution to the data. We used clustering algorithm to get the number of attributes can be removed that available from ranking attributes by Information Gain. This clustering algorithm used Hierarchical K-means (K-means optimization) categorized patients into two groups which are normal and cancer. Our experiments show that the information gain method selected 12 of 18 attributes that have the highest contribution factor of the breast cancer patient treatment based on the last condition. The clustering algorithm error ratio was slighly decreased from 44.48% (using 18 original attributes) to 21.42% (using 12 most important attributes).","PeriodicalId":404904,"journal":{"name":"2016 International Conference on Informatics and Computing (ICIC)","volume":"179 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Informatics and Computing (ICIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAC.2016.7905680","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Breast cancer is the second highest cancer type that attacked Indonesia women. The high breast cancer patients in Indonesia also have an impact on their life expectancy to recover by treatment routinely. Malignancies and death probability are some factor of many determinants of breast cancer patient's recovery. This research examines the determinant factor of breast cancer patient treatment based on the latest condition. The dataset was originally taken from one of oncology hospital in East Java, Indonesia, which is consist of 1907 samples, 18 attributes and 2 classes. We used information gain as feature selection technique by using the entropy formula to select the best attributes that have great contribution to the data. We used clustering algorithm to get the number of attributes can be removed that available from ranking attributes by Information Gain. This clustering algorithm used Hierarchical K-means (K-means optimization) categorized patients into two groups which are normal and cancer. Our experiments show that the information gain method selected 12 of 18 attributes that have the highest contribution factor of the breast cancer patient treatment based on the last condition. The clustering algorithm error ratio was slighly decreased from 44.48% (using 18 original attributes) to 21.42% (using 12 most important attributes).
基于信息增益的聚类特征选择算法支持乳腺癌的治疗过程
乳腺癌是印尼女性发病率第二高的癌症类型。印度尼西亚的高乳腺癌患者通过常规治疗对其预期寿命的恢复也有影响。恶性肿瘤和死亡概率是影响乳腺癌患者康复的诸多因素之一。本研究以最新病情为基础,探讨乳腺癌患者治疗的决定因素。该数据集最初取自印度尼西亚东爪哇的一家肿瘤医院,由1907个样本、18个属性和2个类别组成。我们将信息增益作为特征选择技术,利用熵公式选择对数据贡献较大的最佳属性。我们使用聚类算法,通过信息增益获得可从属性排序中去除的可用属性的数量。该聚类算法采用分层K-means (K-means optimization)将患者分为正常组和肿瘤组。我们的实验表明,信息增益法根据最后的情况,从18个属性中选择了对乳腺癌患者治疗贡献因子最高的12个属性。聚类算法错误率从44.48%(使用18个原始属性)略微降低到21.42%(使用12个最重要属性)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信