Hussein A.A. Al-Khamees , Nor Samsiah Sani , Ahmed Sileh Gifal , Luan Xiang Wei Liu , Mohd Isrul Esa
{"title":"A dynamic model using k-NN algorithm for predicting diabetes and breast cancer","authors":"Hussein A.A. Al-Khamees , Nor Samsiah Sani , Ahmed Sileh Gifal , Luan Xiang Wei Liu , Mohd Isrul Esa","doi":"10.1016/j.compbiomed.2025.110276","DOIUrl":null,"url":null,"abstract":"<div><div>Healthcare remains a critical focus due to its direct impact on human well-being. Diabetes, currently the fastest-growing chronic disease globally, poses severe health risks, including cardiovascular complications and kidney failure. Simultaneously, breast cancer has become the most prevalent cancer among women, particularly those in their 40s, surpassing other types. Early detection and diagnosis of these two diseases remain a substantial challenge, yet they are crucial for reducing mortality rates. Machine learning algorithms emerged as powerful tools in healthcare for disease classification and prediction, with the k-nearest neighbors (k-NN) being one of the most widely used supervised learning algorithm. Different traditional machine learning methods have been proposed, which are heavily specialized for specific datasets. More deeply, traditional k-NN relies on a static k-value, which may not provide optimal results across diverse datasets. This paper proposes a dynamic k-NN model that adjusts ‘k’ value based on local data characteristics, enhancing prediction accuracy. The proposed model is testing on two publicly available datasets; PIMA Diabetes and Breast Cancer Wisconsin (BCW) datasets. Our results are evaluated using different metrics that are; accuracy, precision, recall, F1_score, and execution time. The results of these metrics are as follows; (81.17%, 97.37%), (83.33% 100%), (54.55%, 86.05%), and (65.93%, 92.5%) for PIMA and BCW datasets respectively. These results demonstrate that the proposed model outperformed several state-of-the-art models. Thus, further highlighting its effectiveness and efficiency in medical data classification.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"192 ","pages":"Article 110276"},"PeriodicalIF":7.0000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525006274","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Healthcare remains a critical focus due to its direct impact on human well-being. Diabetes, currently the fastest-growing chronic disease globally, poses severe health risks, including cardiovascular complications and kidney failure. Simultaneously, breast cancer has become the most prevalent cancer among women, particularly those in their 40s, surpassing other types. Early detection and diagnosis of these two diseases remain a substantial challenge, yet they are crucial for reducing mortality rates. Machine learning algorithms emerged as powerful tools in healthcare for disease classification and prediction, with the k-nearest neighbors (k-NN) being one of the most widely used supervised learning algorithm. Different traditional machine learning methods have been proposed, which are heavily specialized for specific datasets. More deeply, traditional k-NN relies on a static k-value, which may not provide optimal results across diverse datasets. This paper proposes a dynamic k-NN model that adjusts ‘k’ value based on local data characteristics, enhancing prediction accuracy. The proposed model is testing on two publicly available datasets; PIMA Diabetes and Breast Cancer Wisconsin (BCW) datasets. Our results are evaluated using different metrics that are; accuracy, precision, recall, F1_score, and execution time. The results of these metrics are as follows; (81.17%, 97.37%), (83.33% 100%), (54.55%, 86.05%), and (65.93%, 92.5%) for PIMA and BCW datasets respectively. These results demonstrate that the proposed model outperformed several state-of-the-art models. Thus, further highlighting its effectiveness and efficiency in medical data classification.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.