A dynamic model using k-NN algorithm for predicting diabetes and breast cancer

IF 7 2区医学 Q1 BIOLOGY

Computers in biology and medicine Pub Date : 2025-05-13 DOI:10.1016/j.compbiomed.2025.110276

Hussein A.A. Al-Khamees , Nor Samsiah Sani , Ahmed Sileh Gifal , Luan Xiang Wei Liu , Mohd Isrul Esa

{"title":"A dynamic model using k-NN algorithm for predicting diabetes and breast cancer","authors":"Hussein A.A. Al-Khamees , Nor Samsiah Sani , Ahmed Sileh Gifal , Luan Xiang Wei Liu , Mohd Isrul Esa","doi":"10.1016/j.compbiomed.2025.110276","DOIUrl":null,"url":null,"abstract":"<div><div>Healthcare remains a critical focus due to its direct impact on human well-being. Diabetes, currently the fastest-growing chronic disease globally, poses severe health risks, including cardiovascular complications and kidney failure. Simultaneously, breast cancer has become the most prevalent cancer among women, particularly those in their 40s, surpassing other types. Early detection and diagnosis of these two diseases remain a substantial challenge, yet they are crucial for reducing mortality rates. Machine learning algorithms emerged as powerful tools in healthcare for disease classification and prediction, with the k-nearest neighbors (k-NN) being one of the most widely used supervised learning algorithm. Different traditional machine learning methods have been proposed, which are heavily specialized for specific datasets. More deeply, traditional k-NN relies on a static k-value, which may not provide optimal results across diverse datasets. This paper proposes a dynamic k-NN model that adjusts ‘k’ value based on local data characteristics, enhancing prediction accuracy. The proposed model is testing on two publicly available datasets; PIMA Diabetes and Breast Cancer Wisconsin (BCW) datasets. Our results are evaluated using different metrics that are; accuracy, precision, recall, F1_score, and execution time. The results of these metrics are as follows; (81.17%, 97.37%), (83.33% 100%), (54.55%, 86.05%), and (65.93%, 92.5%) for PIMA and BCW datasets respectively. These results demonstrate that the proposed model outperformed several state-of-the-art models. Thus, further highlighting its effectiveness and efficiency in medical data classification.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"192 ","pages":"Article 110276"},"PeriodicalIF":7.0000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525006274","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Healthcare remains a critical focus due to its direct impact on human well-being. Diabetes, currently the fastest-growing chronic disease globally, poses severe health risks, including cardiovascular complications and kidney failure. Simultaneously, breast cancer has become the most prevalent cancer among women, particularly those in their 40s, surpassing other types. Early detection and diagnosis of these two diseases remain a substantial challenge, yet they are crucial for reducing mortality rates. Machine learning algorithms emerged as powerful tools in healthcare for disease classification and prediction, with the k-nearest neighbors (k-NN) being one of the most widely used supervised learning algorithm. Different traditional machine learning methods have been proposed, which are heavily specialized for specific datasets. More deeply, traditional k-NN relies on a static k-value, which may not provide optimal results across diverse datasets. This paper proposes a dynamic k-NN model that adjusts ‘k’ value based on local data characteristics, enhancing prediction accuracy. The proposed model is testing on two publicly available datasets; PIMA Diabetes and Breast Cancer Wisconsin (BCW) datasets. Our results are evaluated using different metrics that are; accuracy, precision, recall, F1_score, and execution time. The results of these metrics are as follows; (81.17%, 97.37%), (83.33% 100%), (54.55%, 86.05%), and (65.93%, 92.5%) for PIMA and BCW datasets respectively. These results demonstrate that the proposed model outperformed several state-of-the-art models. Thus, further highlighting its effectiveness and efficiency in medical data classification.

查看原文本刊更多论文

基于k-NN算法的动态模型预测糖尿病和乳腺癌

由于对人类福祉的直接影响，医疗保健仍然是一个关键焦点。糖尿病是目前全球增长最快的慢性疾病，造成严重的健康风险，包括心血管并发症和肾衰竭。与此同时，乳腺癌已经成为女性中最常见的癌症，尤其是40多岁的女性，超过了其他类型的癌症。早期发现和诊断这两种疾病仍然是一项重大挑战，但它们对降低死亡率至关重要。机器学习算法在医疗保健中成为疾病分类和预测的强大工具，其中k-近邻（k-NN）是使用最广泛的监督学习算法之一。人们提出了不同的传统机器学习方法，这些方法主要针对特定的数据集。更深入地说，传统的k-NN依赖于静态k值，这可能无法在不同的数据集上提供最佳结果。本文提出了一种基于局部数据特征调整k值的动态k- nn模型，提高了预测精度。提出的模型正在两个公开可用的数据集上进行测试；PIMA糖尿病和乳腺癌威斯康星州（BCW）数据集。我们的结果是用不同的指标来评估的；准确性、精密度、召回率、F1_score和执行时间。这些指标的结果如下：（81.17%, 97.37%），（83.33%, 100%），（54.55%, 86.05%），（65.93%, 92.5%）。这些结果表明，所提出的模型优于几个最先进的模型。从而进一步突出了其在医疗数据分类中的有效性和高效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers in biology and medicine 工程技术-工程：生物医学

CiteScore

11.70

自引率

10.40%

发文量

1086

审稿时长

74 days

期刊介绍： Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.