Tahira Mahboob, A. Ijaz, Amber Shahzad, Muqadas Kalsoom
{"title":"使用KNN, K-Means和k - mediids算法处理慢性肾脏疾病数据集的缺失值","authors":"Tahira Mahboob, A. Ijaz, Amber Shahzad, Muqadas Kalsoom","doi":"10.1109/ICOSST.2018.8632179","DOIUrl":null,"url":null,"abstract":"Missing values in large datasets have become a difficult task for researchers and industrialists. Specifically in the field of medicine, the datasets contain missing values due to human error or non-availability of data. If these datasets have to utilized for inference purposes or predictive studies, the resutls are not that reliable. Discarding such instances is an option but effects overall accuracy and thus it is viable to perform some replacement or imputation technique. Here, imputaiton technique enable to estimate the missing values in the datasets by applying various algorithms. Therefore, in this paper we present a framework that assists in imouting missing values in a large Chronic Kidney Disease (CKD) datasets. We have used three machine learning algorithms i.e., K-Nearest Neighbors, K-Means and K-Medoids Clustering to impute the missing values. Performance evaluation of the proposed technique has been carried out by application of Decision Tree and Random Forest algorithms. Experimental results demonstrate that KNN algorithm provides the most accurate results compared with K-Means and K-Medoids clustering algorithms. KNN achieves an accuracy of 86.67% for Decision Tree algorithm, and 75.25% for Random Forest algorithm. Additionally it also has a less relative, absolute and root mean square error. Conclusively, KNN imputed datasets are used in our research for future predictions.","PeriodicalId":261288,"journal":{"name":"2018 12th International Conference on Open Source Systems and Technologies (ICOSST)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Handling Missing Values in Chronic Kidney Disease Datasets Using KNN, K-Means and K-Medoids Algorithms\",\"authors\":\"Tahira Mahboob, A. Ijaz, Amber Shahzad, Muqadas Kalsoom\",\"doi\":\"10.1109/ICOSST.2018.8632179\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Missing values in large datasets have become a difficult task for researchers and industrialists. Specifically in the field of medicine, the datasets contain missing values due to human error or non-availability of data. If these datasets have to utilized for inference purposes or predictive studies, the resutls are not that reliable. Discarding such instances is an option but effects overall accuracy and thus it is viable to perform some replacement or imputation technique. Here, imputaiton technique enable to estimate the missing values in the datasets by applying various algorithms. Therefore, in this paper we present a framework that assists in imouting missing values in a large Chronic Kidney Disease (CKD) datasets. We have used three machine learning algorithms i.e., K-Nearest Neighbors, K-Means and K-Medoids Clustering to impute the missing values. Performance evaluation of the proposed technique has been carried out by application of Decision Tree and Random Forest algorithms. Experimental results demonstrate that KNN algorithm provides the most accurate results compared with K-Means and K-Medoids clustering algorithms. KNN achieves an accuracy of 86.67% for Decision Tree algorithm, and 75.25% for Random Forest algorithm. Additionally it also has a less relative, absolute and root mean square error. Conclusively, KNN imputed datasets are used in our research for future predictions.\",\"PeriodicalId\":261288,\"journal\":{\"name\":\"2018 12th International Conference on Open Source Systems and Technologies (ICOSST)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 12th International Conference on Open Source Systems and Technologies (ICOSST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOSST.2018.8632179\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 12th International Conference on Open Source Systems and Technologies (ICOSST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOSST.2018.8632179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Handling Missing Values in Chronic Kidney Disease Datasets Using KNN, K-Means and K-Medoids Algorithms
Missing values in large datasets have become a difficult task for researchers and industrialists. Specifically in the field of medicine, the datasets contain missing values due to human error or non-availability of data. If these datasets have to utilized for inference purposes or predictive studies, the resutls are not that reliable. Discarding such instances is an option but effects overall accuracy and thus it is viable to perform some replacement or imputation technique. Here, imputaiton technique enable to estimate the missing values in the datasets by applying various algorithms. Therefore, in this paper we present a framework that assists in imouting missing values in a large Chronic Kidney Disease (CKD) datasets. We have used three machine learning algorithms i.e., K-Nearest Neighbors, K-Means and K-Medoids Clustering to impute the missing values. Performance evaluation of the proposed technique has been carried out by application of Decision Tree and Random Forest algorithms. Experimental results demonstrate that KNN algorithm provides the most accurate results compared with K-Means and K-Medoids clustering algorithms. KNN achieves an accuracy of 86.67% for Decision Tree algorithm, and 75.25% for Random Forest algorithm. Additionally it also has a less relative, absolute and root mean square error. Conclusively, KNN imputed datasets are used in our research for future predictions.