Tresna Maulana Fahrudin, I. Syarif, Ali Ridho Barakbah
{"title":"Feature selection algorithm using information gain based clustering for supporting the treatment process of breast cancer","authors":"Tresna Maulana Fahrudin, I. Syarif, Ali Ridho Barakbah","doi":"10.1109/IAC.2016.7905680","DOIUrl":null,"url":null,"abstract":"Breast cancer is the second highest cancer type that attacked Indonesia women. The high breast cancer patients in Indonesia also have an impact on their life expectancy to recover by treatment routinely. Malignancies and death probability are some factor of many determinants of breast cancer patient's recovery. This research examines the determinant factor of breast cancer patient treatment based on the latest condition. The dataset was originally taken from one of oncology hospital in East Java, Indonesia, which is consist of 1907 samples, 18 attributes and 2 classes. We used information gain as feature selection technique by using the entropy formula to select the best attributes that have great contribution to the data. We used clustering algorithm to get the number of attributes can be removed that available from ranking attributes by Information Gain. This clustering algorithm used Hierarchical K-means (K-means optimization) categorized patients into two groups which are normal and cancer. Our experiments show that the information gain method selected 12 of 18 attributes that have the highest contribution factor of the breast cancer patient treatment based on the last condition. The clustering algorithm error ratio was slighly decreased from 44.48% (using 18 original attributes) to 21.42% (using 12 most important attributes).","PeriodicalId":404904,"journal":{"name":"2016 International Conference on Informatics and Computing (ICIC)","volume":"179 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Informatics and Computing (ICIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAC.2016.7905680","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Breast cancer is the second highest cancer type that attacked Indonesia women. The high breast cancer patients in Indonesia also have an impact on their life expectancy to recover by treatment routinely. Malignancies and death probability are some factor of many determinants of breast cancer patient's recovery. This research examines the determinant factor of breast cancer patient treatment based on the latest condition. The dataset was originally taken from one of oncology hospital in East Java, Indonesia, which is consist of 1907 samples, 18 attributes and 2 classes. We used information gain as feature selection technique by using the entropy formula to select the best attributes that have great contribution to the data. We used clustering algorithm to get the number of attributes can be removed that available from ranking attributes by Information Gain. This clustering algorithm used Hierarchical K-means (K-means optimization) categorized patients into two groups which are normal and cancer. Our experiments show that the information gain method selected 12 of 18 attributes that have the highest contribution factor of the breast cancer patient treatment based on the last condition. The clustering algorithm error ratio was slighly decreased from 44.48% (using 18 original attributes) to 21.42% (using 12 most important attributes).