Tresna Maulana Fahrudin, I. Syarif, Ali Ridho Barakbah
{"title":"基于信息增益的聚类特征选择算法支持乳腺癌的治疗过程","authors":"Tresna Maulana Fahrudin, I. Syarif, Ali Ridho Barakbah","doi":"10.1109/IAC.2016.7905680","DOIUrl":null,"url":null,"abstract":"Breast cancer is the second highest cancer type that attacked Indonesia women. The high breast cancer patients in Indonesia also have an impact on their life expectancy to recover by treatment routinely. Malignancies and death probability are some factor of many determinants of breast cancer patient's recovery. This research examines the determinant factor of breast cancer patient treatment based on the latest condition. The dataset was originally taken from one of oncology hospital in East Java, Indonesia, which is consist of 1907 samples, 18 attributes and 2 classes. We used information gain as feature selection technique by using the entropy formula to select the best attributes that have great contribution to the data. We used clustering algorithm to get the number of attributes can be removed that available from ranking attributes by Information Gain. This clustering algorithm used Hierarchical K-means (K-means optimization) categorized patients into two groups which are normal and cancer. Our experiments show that the information gain method selected 12 of 18 attributes that have the highest contribution factor of the breast cancer patient treatment based on the last condition. The clustering algorithm error ratio was slighly decreased from 44.48% (using 18 original attributes) to 21.42% (using 12 most important attributes).","PeriodicalId":404904,"journal":{"name":"2016 International Conference on Informatics and Computing (ICIC)","volume":"179 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Feature selection algorithm using information gain based clustering for supporting the treatment process of breast cancer\",\"authors\":\"Tresna Maulana Fahrudin, I. Syarif, Ali Ridho Barakbah\",\"doi\":\"10.1109/IAC.2016.7905680\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Breast cancer is the second highest cancer type that attacked Indonesia women. The high breast cancer patients in Indonesia also have an impact on their life expectancy to recover by treatment routinely. Malignancies and death probability are some factor of many determinants of breast cancer patient's recovery. This research examines the determinant factor of breast cancer patient treatment based on the latest condition. The dataset was originally taken from one of oncology hospital in East Java, Indonesia, which is consist of 1907 samples, 18 attributes and 2 classes. We used information gain as feature selection technique by using the entropy formula to select the best attributes that have great contribution to the data. We used clustering algorithm to get the number of attributes can be removed that available from ranking attributes by Information Gain. This clustering algorithm used Hierarchical K-means (K-means optimization) categorized patients into two groups which are normal and cancer. Our experiments show that the information gain method selected 12 of 18 attributes that have the highest contribution factor of the breast cancer patient treatment based on the last condition. The clustering algorithm error ratio was slighly decreased from 44.48% (using 18 original attributes) to 21.42% (using 12 most important attributes).\",\"PeriodicalId\":404904,\"journal\":{\"name\":\"2016 International Conference on Informatics and Computing (ICIC)\",\"volume\":\"179 \",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Informatics and Computing (ICIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IAC.2016.7905680\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Informatics and Computing (ICIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAC.2016.7905680","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature selection algorithm using information gain based clustering for supporting the treatment process of breast cancer
Breast cancer is the second highest cancer type that attacked Indonesia women. The high breast cancer patients in Indonesia also have an impact on their life expectancy to recover by treatment routinely. Malignancies and death probability are some factor of many determinants of breast cancer patient's recovery. This research examines the determinant factor of breast cancer patient treatment based on the latest condition. The dataset was originally taken from one of oncology hospital in East Java, Indonesia, which is consist of 1907 samples, 18 attributes and 2 classes. We used information gain as feature selection technique by using the entropy formula to select the best attributes that have great contribution to the data. We used clustering algorithm to get the number of attributes can be removed that available from ranking attributes by Information Gain. This clustering algorithm used Hierarchical K-means (K-means optimization) categorized patients into two groups which are normal and cancer. Our experiments show that the information gain method selected 12 of 18 attributes that have the highest contribution factor of the breast cancer patient treatment based on the last condition. The clustering algorithm error ratio was slighly decreased from 44.48% (using 18 original attributes) to 21.42% (using 12 most important attributes).