MACHINE LEARNING ALGORITHM SELECTION FOR CHRONIC KIDNEY DISEASE DIAGNOSIS AND CLASSIFICATION

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Malaysian Journal of Computer Science Pub Date : 2022-03-31 DOI:10.22452/mjcs.sp2022no1.8

M. Gokiladevi, Sundar Santhoshkumar, Vijayakumar Varadarajan

{"title":"MACHINE LEARNING ALGORITHM SELECTION FOR CHRONIC KIDNEY DISEASE DIAGNOSIS AND CLASSIFICATION","authors":"M. Gokiladevi, Sundar Santhoshkumar, Vijayakumar Varadarajan","doi":"10.22452/mjcs.sp2022no1.8","DOIUrl":null,"url":null,"abstract":"In last decades, chronic kidney disease (CKD) becomes a global health problem that is steadily developing worldwide. It is a chronic illness highly related to increased morbidity and mortality, cardiovascular diseases, and high healthcare cost. Earlier identification and classification of CKD is treated as a major factor in controlling the mortality rate. Data mining (DM) techniques are used for the extraction of hidden details from the clinical and laboratory patient data that is used to aid doctors in enhancing diagnostic accuracy. Recently, machine learning (ML) techniques are commonly employed for the prediction and classification of diseases in healthcare sector. With this motivation, this study examines the performance of different ML algorithms to diagnose CKD at the earlier stages. The proposed model involves data pre-processing in two stages such as missing value replacement and data transformation. Besides, a set of five ML based classification models are involved such as support vector machine (SVM), random forest (RF), logistic regression (LR), K-nearest neighbor (KNN), and decision tree (DT). For investigating the performance of the different ML models, a benchmark CKD dataset from UCI repository is employed and the results are examined under different aspects. Among the different classifiers, the RF model has accomplished superior results with the maximum precision of 0.99, recall of 0.99, and F-score of 0.99 with a minimal error rate of 0.012.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2022-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.22452/mjcs.sp2022no1.8","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 2

Abstract

In last decades, chronic kidney disease (CKD) becomes a global health problem that is steadily developing worldwide. It is a chronic illness highly related to increased morbidity and mortality, cardiovascular diseases, and high healthcare cost. Earlier identification and classification of CKD is treated as a major factor in controlling the mortality rate. Data mining (DM) techniques are used for the extraction of hidden details from the clinical and laboratory patient data that is used to aid doctors in enhancing diagnostic accuracy. Recently, machine learning (ML) techniques are commonly employed for the prediction and classification of diseases in healthcare sector. With this motivation, this study examines the performance of different ML algorithms to diagnose CKD at the earlier stages. The proposed model involves data pre-processing in two stages such as missing value replacement and data transformation. Besides, a set of five ML based classification models are involved such as support vector machine (SVM), random forest (RF), logistic regression (LR), K-nearest neighbor (KNN), and decision tree (DT). For investigating the performance of the different ML models, a benchmark CKD dataset from UCI repository is employed and the results are examined under different aspects. Among the different classifiers, the RF model has accomplished superior results with the maximum precision of 0.99, recall of 0.99, and F-score of 0.99 with a minimal error rate of 0.012.

查看原文本刊更多论文

用于慢性肾脏疾病诊断和分类的机器学习算法选择

近几十年来，慢性肾脏疾病(CKD)已成为一个全球性的健康问题，并在全球范围内稳步发展。它是一种慢性疾病，与发病率和死亡率增加、心血管疾病和高医疗费用密切相关。CKD的早期识别和分类被认为是控制死亡率的主要因素。数据挖掘(DM)技术用于从临床和实验室患者数据中提取隐藏的细节，用于帮助医生提高诊断准确性。近年来，机器学习(ML)技术被广泛应用于医疗保健领域的疾病预测和分类。基于这一动机，本研究检验了不同ML算法在早期阶段诊断CKD的性能。该模型将数据预处理分为缺失值替换和数据转换两个阶段。此外，还涉及了支持向量机(SVM)、随机森林(RF)、逻辑回归(LR)、k近邻(KNN)和决策树(DT)等五种基于ML的分类模型。为了研究不同机器学习模型的性能，使用了UCI存储库中的基准CKD数据集，并从不同方面对结果进行了检验。在不同的分类器中，RF模型取得了较好的结果，最高精度为0.99，召回率为0.99,f分数为0.99，最小错误率为0.012。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Malaysian Journal of Computer Science COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

2.20

自引率

33.30%

发文量

审稿时长

7.5 months

期刊介绍： The Malaysian Journal of Computer Science (ISSN 0127-9084) is published four times a year in January, April, July and October by the Faculty of Computer Science and Information Technology, University of Malaya, since 1985. Over the years, the journal has gained popularity and the number of paper submissions has increased steadily. The rigorous reviews from the referees have helped in ensuring that the high standard of the journal is maintained. The objectives are to promote exchange of information and knowledge in research work, new inventions/developments of Computer Science and on the use of Information Technology towards the structuring of an information-rich society and to assist the academic staff from local and foreign universities, business and industrial sectors, government departments and academic institutions on publishing research results and studies in Computer Science and Information Technology through a scholarly publication. The journal is being indexed and abstracted by Clarivate Analytics'' Web of Science and Elsevier''s Scopus