{"title":"A Comprehensive Exploration on Impact of Preprocessing for Prediction of Chronic Kidney Disease Using Multiple Machine Learning Approaches","authors":"Nahid Hossain Taz, Abrar Islam, Ishrak Mahmud, Ehtashamul Haque, Md. Raqibur Rahman","doi":"10.1109/icsct53883.2021.9642638","DOIUrl":null,"url":null,"abstract":"This manuscript aims to develop a framework to deliver the prediction of Chronic Kidney Disease (CKD) for a patient using the Machine Learning technique. The performance of five individual Machine Learning classifiers is analyzed for the purpose of clarifying the performance measures of CKD. According to average classification accuracy, precision, recall, and fl score, the decisions are estimated along with the ROC AUC score. For this investigation, Logistic Regression, K Nearest Neighbor, Support Vector Machine, Naive Bayes, and Random Forest classifiers are applied as distinct classifiers. In order to increase and stabilize the performance metrics of the classifiers, necessary data preprocessing is carried out on the CKD dataset. Observation of the corresponding performance metrics indicates that Random Forest has outperformed all the other classifiers by producing an accuracy score of 93.4% and ROC AUC of 94.4% before data preprocessing and an accuracy score of 95.6% and ROC AUC of 96.2% after necessary data preprocessing.","PeriodicalId":320103,"journal":{"name":"2021 International Conference on Science & Contemporary Technologies (ICSCT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Science & Contemporary Technologies (ICSCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icsct53883.2021.9642638","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This manuscript aims to develop a framework to deliver the prediction of Chronic Kidney Disease (CKD) for a patient using the Machine Learning technique. The performance of five individual Machine Learning classifiers is analyzed for the purpose of clarifying the performance measures of CKD. According to average classification accuracy, precision, recall, and fl score, the decisions are estimated along with the ROC AUC score. For this investigation, Logistic Regression, K Nearest Neighbor, Support Vector Machine, Naive Bayes, and Random Forest classifiers are applied as distinct classifiers. In order to increase and stabilize the performance metrics of the classifiers, necessary data preprocessing is carried out on the CKD dataset. Observation of the corresponding performance metrics indicates that Random Forest has outperformed all the other classifiers by producing an accuracy score of 93.4% and ROC AUC of 94.4% before data preprocessing and an accuracy score of 95.6% and ROC AUC of 96.2% after necessary data preprocessing.