用树结构Parzen估计器优化超参数以提高糖尿病预测。

IF 3.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Raafat M Munshi, Lammar R Munshi, Hanen Himdi, Amjad Qashlan, Reema Munshi, Othman Y Alyahyawy, Mashael M Khayyat
{"title":"用树结构Parzen估计器优化超参数以提高糖尿病预测。","authors":"Raafat M Munshi, Lammar R Munshi, Hanen Himdi, Amjad Qashlan, Reema Munshi, Othman Y Alyahyawy, Mashael M Khayyat","doi":"10.1038/s41598-025-19295-x","DOIUrl":null,"url":null,"abstract":"<p><p>Diabetes is a lifelong condition that occurs when the pancreas loses its ability to secrete insulin or experiences a significant reduction in insulin production. Early identification of high-risk patients is crucial for timely interventions and improved outcomes. Traditional clinical risk prediction models rely on regression analysis using clinical, sociodemographic, and anthropometric data; however, they have limitations in terms of accuracy and generalizability. This research proposes a diagnostic strategy leveraging machine learning (ML) techniques, specifically the XGBoost algorithm optimised with Optuna, to enhance high-risk prediction based on laboratory parameters. The study utilises an open-access diabetes dataset incorporating patient demographics, laboratory test results, and clinical outcomes. Data preprocessing, including cleaning, normalisation, and feature extraction, is performed using an Adaptive Tree-Structured Parzen Estimator (ATPE) and XGBoost model. The proposed model outperforms conventional classification models, achieving 83% accuracy, 80% precision, 78% recall, and a 78% F1 score. A comprehensive correlation and confusion matrix evaluation highlights the model's effectiveness in distinguishing high-risk patients. Findings indicate that integrating machine learning (ML)-based risk classification frameworks with laboratory test-based diagnostic strategies improves predictive accuracy and patient stratification. However, data quality, population diversity, and real-time applicability remain challenges. Future research should explore the integration of real-time data from wearable devices and expand model deployment to other chronic and rare diseases, enhancing adaptability and clinical decision-making.</p>","PeriodicalId":21811,"journal":{"name":"Scientific Reports","volume":"15 1","pages":"35430"},"PeriodicalIF":3.9000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimising hyperparameters with a tree structured Parzen estimator to improve diabetes prediction.\",\"authors\":\"Raafat M Munshi, Lammar R Munshi, Hanen Himdi, Amjad Qashlan, Reema Munshi, Othman Y Alyahyawy, Mashael M Khayyat\",\"doi\":\"10.1038/s41598-025-19295-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Diabetes is a lifelong condition that occurs when the pancreas loses its ability to secrete insulin or experiences a significant reduction in insulin production. Early identification of high-risk patients is crucial for timely interventions and improved outcomes. Traditional clinical risk prediction models rely on regression analysis using clinical, sociodemographic, and anthropometric data; however, they have limitations in terms of accuracy and generalizability. This research proposes a diagnostic strategy leveraging machine learning (ML) techniques, specifically the XGBoost algorithm optimised with Optuna, to enhance high-risk prediction based on laboratory parameters. The study utilises an open-access diabetes dataset incorporating patient demographics, laboratory test results, and clinical outcomes. Data preprocessing, including cleaning, normalisation, and feature extraction, is performed using an Adaptive Tree-Structured Parzen Estimator (ATPE) and XGBoost model. The proposed model outperforms conventional classification models, achieving 83% accuracy, 80% precision, 78% recall, and a 78% F1 score. A comprehensive correlation and confusion matrix evaluation highlights the model's effectiveness in distinguishing high-risk patients. Findings indicate that integrating machine learning (ML)-based risk classification frameworks with laboratory test-based diagnostic strategies improves predictive accuracy and patient stratification. However, data quality, population diversity, and real-time applicability remain challenges. Future research should explore the integration of real-time data from wearable devices and expand model deployment to other chronic and rare diseases, enhancing adaptability and clinical decision-making.</p>\",\"PeriodicalId\":21811,\"journal\":{\"name\":\"Scientific Reports\",\"volume\":\"15 1\",\"pages\":\"35430\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Reports\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41598-025-19295-x\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Reports","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41598-025-19295-x","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

糖尿病是一种终身疾病,当胰腺失去分泌胰岛素的能力或胰岛素分泌显著减少时就会发生。早期识别高危患者对于及时干预和改善预后至关重要。传统的临床风险预测模型依赖于使用临床、社会人口学和人体测量数据的回归分析;然而,它们在准确性和概括性方面存在局限性。本研究提出了一种利用机器学习(ML)技术的诊断策略,特别是使用Optuna优化的XGBoost算法,以增强基于实验室参数的高风险预测。该研究利用了一个开放获取的糖尿病数据集,包括患者人口统计、实验室检测结果和临床结果。数据预处理,包括清理、归一化和特征提取,使用自适应树结构Parzen估计器(ATPE)和XGBoost模型进行。所提出的模型优于传统的分类模型,达到83%的准确率,80%的精度,78%的召回率和78%的F1分数。综合的相关性和混淆矩阵评价突出了该模型在区分高危患者方面的有效性。研究结果表明,将基于机器学习(ML)的风险分类框架与基于实验室测试的诊断策略相结合,可以提高预测准确性和患者分层。然而,数据质量、人口多样性和实时适用性仍然是挑战。未来的研究应探索可穿戴设备实时数据的整合,并将模型部署扩展到其他慢性和罕见疾病,增强适应性和临床决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimising hyperparameters with a tree structured Parzen estimator to improve diabetes prediction.

Diabetes is a lifelong condition that occurs when the pancreas loses its ability to secrete insulin or experiences a significant reduction in insulin production. Early identification of high-risk patients is crucial for timely interventions and improved outcomes. Traditional clinical risk prediction models rely on regression analysis using clinical, sociodemographic, and anthropometric data; however, they have limitations in terms of accuracy and generalizability. This research proposes a diagnostic strategy leveraging machine learning (ML) techniques, specifically the XGBoost algorithm optimised with Optuna, to enhance high-risk prediction based on laboratory parameters. The study utilises an open-access diabetes dataset incorporating patient demographics, laboratory test results, and clinical outcomes. Data preprocessing, including cleaning, normalisation, and feature extraction, is performed using an Adaptive Tree-Structured Parzen Estimator (ATPE) and XGBoost model. The proposed model outperforms conventional classification models, achieving 83% accuracy, 80% precision, 78% recall, and a 78% F1 score. A comprehensive correlation and confusion matrix evaluation highlights the model's effectiveness in distinguishing high-risk patients. Findings indicate that integrating machine learning (ML)-based risk classification frameworks with laboratory test-based diagnostic strategies improves predictive accuracy and patient stratification. However, data quality, population diversity, and real-time applicability remain challenges. Future research should explore the integration of real-time data from wearable devices and expand model deployment to other chronic and rare diseases, enhancing adaptability and clinical decision-making.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Scientific Reports
Scientific Reports Natural Science Disciplines-
CiteScore
7.50
自引率
4.30%
发文量
19567
审稿时长
3.9 months
期刊介绍: We publish original research from all areas of the natural sciences, psychology, medicine and engineering. You can learn more about what we publish by browsing our specific scientific subject areas below or explore Scientific Reports by browsing all articles and collections. Scientific Reports has a 2-year impact factor: 4.380 (2021), and is the 6th most-cited journal in the world, with more than 540,000 citations in 2020 (Clarivate Analytics, 2021). •Engineering Engineering covers all aspects of engineering, technology, and applied science. It plays a crucial role in the development of technologies to address some of the world''s biggest challenges, helping to save lives and improve the way we live. •Physical sciences Physical sciences are those academic disciplines that aim to uncover the underlying laws of nature — often written in the language of mathematics. It is a collective term for areas of study including astronomy, chemistry, materials science and physics. •Earth and environmental sciences Earth and environmental sciences cover all aspects of Earth and planetary science and broadly encompass solid Earth processes, surface and atmospheric dynamics, Earth system history, climate and climate change, marine and freshwater systems, and ecology. It also considers the interactions between humans and these systems. •Biological sciences Biological sciences encompass all the divisions of natural sciences examining various aspects of vital processes. The concept includes anatomy, physiology, cell biology, biochemistry and biophysics, and covers all organisms from microorganisms, animals to plants. •Health sciences The health sciences study health, disease and healthcare. This field of study aims to develop knowledge, interventions and technology for use in healthcare to improve the treatment of patients.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信