Kiki Aristiawati, T. Siswantining, Devvi Sarwinda, S. Soemartojo
{"title":"基于模糊c均值算法的缺失值估算慢性阻塞性肺疾病(COPD)分类","authors":"Kiki Aristiawati, T. Siswantining, Devvi Sarwinda, S. Soemartojo","doi":"10.1063/1.5139149","DOIUrl":null,"url":null,"abstract":"Chronic Obstructive Pulmonary Disease (COPD) is one of the most causes of death in the world. World Health Organization (WHO) reported that in 2016 COPD was the third leading cause of death worldwide with around 3 million deaths, equivalent to 5.2% of deaths worldwide. For this reason, further research needs to be done on CPOD. Unfortunately, the data collected in the study does not contain all the desired data, is called as a missing value. Missing value is a problem for all types of data analysis. Several ways that can be applied to handle missing value, by filtering data (ignore or remove data) and imputing data. Ignoring or removing data can reduce the amount of information contained in the data and can cause low accuracy to generate from the data analysis process. To overcome this problem, imputation data will be carried out at the preprocessing stage to obtain complete data which is expected to increase the accuracy of the data analysis performed. Many imputations method can be used, such as mean imputation and Fuzzy C-Means (FCM). Fuzzy C-Means is a clustering method that allows one part of the data to belong to two or more groups based on their membership function. The complete dataset was trained with Decision Tree classifier to observe the performance in terms of accuracy for mean and FCM method. The analysis of proposed imputation on classification shows that FCM slightly accurate compare to mean imputation method.Chronic Obstructive Pulmonary Disease (COPD) is one of the most causes of death in the world. World Health Organization (WHO) reported that in 2016 COPD was the third leading cause of death worldwide with around 3 million deaths, equivalent to 5.2% of deaths worldwide. For this reason, further research needs to be done on CPOD. Unfortunately, the data collected in the study does not contain all the desired data, is called as a missing value. Missing value is a problem for all types of data analysis. Several ways that can be applied to handle missing value, by filtering data (ignore or remove data) and imputing data. Ignoring or removing data can reduce the amount of information contained in the data and can cause low accuracy to generate from the data analysis process. To overcome this problem, imputation data will be carried out at the preprocessing stage to obtain complete data which is expected to increase the accuracy of the data analysis performed. Many imputations method can be used, such as mean im...","PeriodicalId":209108,"journal":{"name":"PROCEEDINGS OF THE 8TH SEAMS-UGM INTERNATIONAL CONFERENCE ON MATHEMATICS AND ITS APPLICATIONS 2019: Deepening Mathematical Concepts for Wider Application through Multidisciplinary Research and Industries Collaborations","volume":"169 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Missing values imputation based on fuzzy C-Means algorithm for classification of chronic obstructive pulmonary disease (COPD)\",\"authors\":\"Kiki Aristiawati, T. Siswantining, Devvi Sarwinda, S. Soemartojo\",\"doi\":\"10.1063/1.5139149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Chronic Obstructive Pulmonary Disease (COPD) is one of the most causes of death in the world. World Health Organization (WHO) reported that in 2016 COPD was the third leading cause of death worldwide with around 3 million deaths, equivalent to 5.2% of deaths worldwide. For this reason, further research needs to be done on CPOD. Unfortunately, the data collected in the study does not contain all the desired data, is called as a missing value. Missing value is a problem for all types of data analysis. Several ways that can be applied to handle missing value, by filtering data (ignore or remove data) and imputing data. Ignoring or removing data can reduce the amount of information contained in the data and can cause low accuracy to generate from the data analysis process. To overcome this problem, imputation data will be carried out at the preprocessing stage to obtain complete data which is expected to increase the accuracy of the data analysis performed. Many imputations method can be used, such as mean imputation and Fuzzy C-Means (FCM). Fuzzy C-Means is a clustering method that allows one part of the data to belong to two or more groups based on their membership function. The complete dataset was trained with Decision Tree classifier to observe the performance in terms of accuracy for mean and FCM method. The analysis of proposed imputation on classification shows that FCM slightly accurate compare to mean imputation method.Chronic Obstructive Pulmonary Disease (COPD) is one of the most causes of death in the world. World Health Organization (WHO) reported that in 2016 COPD was the third leading cause of death worldwide with around 3 million deaths, equivalent to 5.2% of deaths worldwide. For this reason, further research needs to be done on CPOD. Unfortunately, the data collected in the study does not contain all the desired data, is called as a missing value. Missing value is a problem for all types of data analysis. Several ways that can be applied to handle missing value, by filtering data (ignore or remove data) and imputing data. Ignoring or removing data can reduce the amount of information contained in the data and can cause low accuracy to generate from the data analysis process. To overcome this problem, imputation data will be carried out at the preprocessing stage to obtain complete data which is expected to increase the accuracy of the data analysis performed. Many imputations method can be used, such as mean im...\",\"PeriodicalId\":209108,\"journal\":{\"name\":\"PROCEEDINGS OF THE 8TH SEAMS-UGM INTERNATIONAL CONFERENCE ON MATHEMATICS AND ITS APPLICATIONS 2019: Deepening Mathematical Concepts for Wider Application through Multidisciplinary Research and Industries Collaborations\",\"volume\":\"169 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PROCEEDINGS OF THE 8TH SEAMS-UGM INTERNATIONAL CONFERENCE ON MATHEMATICS AND ITS APPLICATIONS 2019: Deepening Mathematical Concepts for Wider Application through Multidisciplinary Research and Industries Collaborations\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1063/1.5139149\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PROCEEDINGS OF THE 8TH SEAMS-UGM INTERNATIONAL CONFERENCE ON MATHEMATICS AND ITS APPLICATIONS 2019: Deepening Mathematical Concepts for Wider Application through Multidisciplinary Research and Industries Collaborations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1063/1.5139149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Missing values imputation based on fuzzy C-Means algorithm for classification of chronic obstructive pulmonary disease (COPD)
Chronic Obstructive Pulmonary Disease (COPD) is one of the most causes of death in the world. World Health Organization (WHO) reported that in 2016 COPD was the third leading cause of death worldwide with around 3 million deaths, equivalent to 5.2% of deaths worldwide. For this reason, further research needs to be done on CPOD. Unfortunately, the data collected in the study does not contain all the desired data, is called as a missing value. Missing value is a problem for all types of data analysis. Several ways that can be applied to handle missing value, by filtering data (ignore or remove data) and imputing data. Ignoring or removing data can reduce the amount of information contained in the data and can cause low accuracy to generate from the data analysis process. To overcome this problem, imputation data will be carried out at the preprocessing stage to obtain complete data which is expected to increase the accuracy of the data analysis performed. Many imputations method can be used, such as mean imputation and Fuzzy C-Means (FCM). Fuzzy C-Means is a clustering method that allows one part of the data to belong to two or more groups based on their membership function. The complete dataset was trained with Decision Tree classifier to observe the performance in terms of accuracy for mean and FCM method. The analysis of proposed imputation on classification shows that FCM slightly accurate compare to mean imputation method.Chronic Obstructive Pulmonary Disease (COPD) is one of the most causes of death in the world. World Health Organization (WHO) reported that in 2016 COPD was the third leading cause of death worldwide with around 3 million deaths, equivalent to 5.2% of deaths worldwide. For this reason, further research needs to be done on CPOD. Unfortunately, the data collected in the study does not contain all the desired data, is called as a missing value. Missing value is a problem for all types of data analysis. Several ways that can be applied to handle missing value, by filtering data (ignore or remove data) and imputing data. Ignoring or removing data can reduce the amount of information contained in the data and can cause low accuracy to generate from the data analysis process. To overcome this problem, imputation data will be carried out at the preprocessing stage to obtain complete data which is expected to increase the accuracy of the data analysis performed. Many imputations method can be used, such as mean im...