Two Machine-learning Hybrid Models for Predicting Type 2 Diabetes Mellitus.

IF 1.3 Q4 ENGINEERING, BIOMEDICAL

Journal of Medical Signals & Sensors Pub Date : 2025-04-19 eCollection Date: 2025-01-01 DOI:10.4103/jmss.jmss_29_24

Rahman Farnoosh, Karlo Abnoosian, Rasha Abbas Isewid

{"title":"Two Machine-learning Hybrid Models for Predicting Type 2 Diabetes Mellitus.","authors":"Rahman Farnoosh, Karlo Abnoosian, Rasha Abbas Isewid","doi":"10.4103/jmss.jmss_29_24","DOIUrl":null,"url":null,"abstract":"Background: The global increase in diabetes prevalence necessitates advanced diagnostic methods. Machine learning has shown promise in disease diagnosis, including diabetes.Materials and methods: We used a dataset collected from the Medical City Hospital laboratory and the Specialized Center for Endocrinology and Diabetes at Al-Kindy Teaching Hospital in Iraq. This dataset includes 1000 physical examination samples from both male and female patients. The samples are categorized into three classes: diabetic (Y), nondiabetic (N), and predicted diabetic (P). The dataset contains twelve attributes and includes outlier data. Outliers in medical studies can result from unusual disease attributes. Therefore, consulting with a specialist physician to identify and handle these outliers using statistical methods is necessary. The main contribution of this study is the proposal of two hybrid models for diabetes diagnosis in two scenarios: (1) Scenario 1 (presence of outlier data): Hybrid Model 1 combines the K-medoids clustering algorithm with a Gaussian naive Bayes (GNB) classifier based on kernel density estimation (KDE) to handle outliers and (2) Scenario 2 (after removing outlier data): Hybrid Model 2 combines the K-means clustering algorithm with a GNB classifier based on KDE with suitable bandwidth. We performed principal component analysis to minimize dimensionality and evaluated the models using fivefold cross-validation.Results: All experiments were conducted in identical settings. Our proposed hybrid models demonstrated superior performance in two scenarios, handling and rejecting outliers, compared to other machine-learning models in this study, including support vector machines (with radial-based, polynomial, linear, and sigmoid kernel functions), decision trees (J48), and GNB classifiers for diabetes prediction. The average accuracy for Scenario 1 with Hybrid Model 1 was 0.9743, and for Scenario 2 with Hybrid Model 2, it was 0.9867. We also evaluated precision, sensitivity, and F1-score as performance metrics.Conclusion: This study presents two hybrid models for diabetes diagnosis, demonstrating high accuracy in distinguishing between diabetic and nondiabetic patients and effectively handling outliers. The findings highlight the potential of machine-learning techniques for improving the early diagnosis and treatment of diabetes.","PeriodicalId":37680,"journal":{"name":"Journal of Medical Signals & Sensors","volume":"15 ","pages":"11"},"PeriodicalIF":1.3000,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12063970/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Signals & Sensors","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4103/jmss.jmss_29_24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The global increase in diabetes prevalence necessitates advanced diagnostic methods. Machine learning has shown promise in disease diagnosis, including diabetes.

Materials and methods: We used a dataset collected from the Medical City Hospital laboratory and the Specialized Center for Endocrinology and Diabetes at Al-Kindy Teaching Hospital in Iraq. This dataset includes 1000 physical examination samples from both male and female patients. The samples are categorized into three classes: diabetic (Y), nondiabetic (N), and predicted diabetic (P). The dataset contains twelve attributes and includes outlier data. Outliers in medical studies can result from unusual disease attributes. Therefore, consulting with a specialist physician to identify and handle these outliers using statistical methods is necessary. The main contribution of this study is the proposal of two hybrid models for diabetes diagnosis in two scenarios: (1) Scenario 1 (presence of outlier data): Hybrid Model 1 combines the K-medoids clustering algorithm with a Gaussian naive Bayes (GNB) classifier based on kernel density estimation (KDE) to handle outliers and (2) Scenario 2 (after removing outlier data): Hybrid Model 2 combines the K-means clustering algorithm with a GNB classifier based on KDE with suitable bandwidth. We performed principal component analysis to minimize dimensionality and evaluated the models using fivefold cross-validation.

Results: All experiments were conducted in identical settings. Our proposed hybrid models demonstrated superior performance in two scenarios, handling and rejecting outliers, compared to other machine-learning models in this study, including support vector machines (with radial-based, polynomial, linear, and sigmoid kernel functions), decision trees (J48), and GNB classifiers for diabetes prediction. The average accuracy for Scenario 1 with Hybrid Model 1 was 0.9743, and for Scenario 2 with Hybrid Model 2, it was 0.9867. We also evaluated precision, sensitivity, and F1-score as performance metrics.

Conclusion: This study presents two hybrid models for diabetes diagnosis, demonstrating high accuracy in distinguishing between diabetic and nondiabetic patients and effectively handling outliers. The findings highlight the potential of machine-learning techniques for improving the early diagnosis and treatment of diabetes.

查看原文本刊更多论文

预测2型糖尿病的两种机器学习混合模型。

背景：全球糖尿病患病率的增加需要先进的诊断方法。机器学习在包括糖尿病在内的疾病诊断方面显示出了前景。材料和方法：我们使用了从伊拉克Al-Kindy教学医院的医学城市医院实验室和内分泌和糖尿病专业中心收集的数据集。该数据集包括来自男性和女性患者的1000个体检样本。样本分为三类：糖尿病(Y)、非糖尿病(N)和预测糖尿病(P)。该数据集包含12个属性，包括离群数据。医学研究中的异常值可能源于不寻常的疾病属性。因此，咨询专业医生，用统计方法识别和处理这些异常值是必要的。本研究的主要贡献是在两种情况下提出了两种用于糖尿病诊断的混合模型：(1)场景1（存在离群数据）：混合模型1将K-medoids聚类算法与基于核密度估计（KDE）的高斯朴素贝叶斯（GNB）分类器相结合来处理离群数据；(2)场景2（去除离群数据后）：混合模型2将K-means聚类算法与基于KDE的GNB分类器相结合，并具有合适的带宽。我们进行主成分分析以最小化维度，并使用五重交叉验证评估模型。结果：所有实验均在相同环境下进行。与本研究中的其他机器学习模型（包括支持向量机（具有基于径向的、多项式的、线性的和s型核函数）、决策树（J48）和用于糖尿病预测的GNB分类器）相比，我们提出的混合模型在处理和拒绝异常值两种情况下表现出优越的性能。使用混合模型1的情景1的平均准确率为0.9743，使用混合模型2的情景2的平均准确率为0.9867。我们还评估了精度、灵敏度和f1评分作为性能指标。结论：本研究提出了两种糖尿病诊断的混合模型，在区分糖尿病和非糖尿病患者方面具有较高的准确性，并能有效地处理异常值。这一发现突出了机器学习技术在改善糖尿病早期诊断和治疗方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Signals & Sensors ENGINEERING, BIOMEDICAL-

CiteScore

2.30

自引率

0.00%

发文量

审稿时长

33 weeks

期刊介绍： JMSS is an interdisciplinary journal that incorporates all aspects of the biomedical engineering including bioelectrics, bioinformatics, medical physics, health technology assessment, etc. Subject areas covered by the journal include: - Bioelectric: Bioinstruments Biosensors Modeling Biomedical signal processing Medical image analysis and processing Medical imaging devices Control of biological systems Neuromuscular systems Cognitive sciences Telemedicine Robotic Medical ultrasonography Bioelectromagnetics Electrophysiology Cell tracking - Bioinformatics and medical informatics: Analysis of biological data Data mining Stochastic modeling Computational genomics Artificial intelligence & fuzzy Applications Medical softwares Bioalgorithms Electronic health - Biophysics and medical physics: Computed tomography Radiation therapy Laser therapy - Education in biomedical engineering - Health technology assessment - Standard in biomedical engineering.