骨质疏松症:使用NHANES数据和机器学习方法预测骨质疏松症风险。

IF 1.6 Q2 MULTIDISCIPLINARY SCIENCES
Zebing Si, Di Zhang, Huajun Wang, Xiaofei Zheng
{"title":"骨质疏松症:使用NHANES数据和机器学习方法预测骨质疏松症风险。","authors":"Zebing Si, Di Zhang, Huajun Wang, Xiaofei Zheng","doi":"10.1186/s13104-025-07089-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Osteoporosis, prevalent among the elderly population, is primarily diagnosed through bone mineral density (BMD) testing, which has limitations in early detection. This study aims to develop and validate a machine learning approach for osteoporosis identification by integrating demographic data, laboratory and questionnaire data, offering a more practical and effective screening alternative.</p><p><strong>Methods: </strong>In this study, data from the National Health and Nutrition Examination Survey were analyzed to explore factors linked to osteoporosis. After cleaning, 8766 participants with 223 variables were studied. Minimum Redundancy Maximum Relevance and SelectKBest were employed to select the import features. Four Machine learning algorithms (RF, NN, LightGBM and XGBoost.) were applied to examine osteoporosis, with performance comparisons made. Data balancing was done using SMOTE, and metrics like F1 score, and AUC were evaluated for each algorithm.</p><p><strong>Results: </strong>The LightGBM model outperformed others with an F1 score of 0.914, an MCC of 0.831, and an AUC of 0.970 on the training set. On the test set, it achieved an F1 score of 0.912, an MCC of 0.826, and an AUC of 0.972. Top predictors for osteoporosis were height, age, and sex.</p><p><strong>Conclusions: </strong>This study demonstrates the potential of machine learning models in assessing an individual's risk of developing osteoporosis, a condition that significantly impacts quality of life and imposes substantial healthcare costs. The superior performance of the LightGBM model suggests a promising tool for early detection and personalized prevention strategies. Importantly, identifying height, age, and sex as top predictors offers critical insights into the demographic and physiological factors that clinicians should consider when evaluating patients' risk profiles.</p>","PeriodicalId":9234,"journal":{"name":"BMC Research Notes","volume":"18 1","pages":"108"},"PeriodicalIF":1.6000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11899459/pdf/","citationCount":"0","resultStr":"{\"title\":\"PrOsteoporosis: predicting osteoporosis risk using NHANES data and machine learning approach.\",\"authors\":\"Zebing Si, Di Zhang, Huajun Wang, Xiaofei Zheng\",\"doi\":\"10.1186/s13104-025-07089-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>Osteoporosis, prevalent among the elderly population, is primarily diagnosed through bone mineral density (BMD) testing, which has limitations in early detection. This study aims to develop and validate a machine learning approach for osteoporosis identification by integrating demographic data, laboratory and questionnaire data, offering a more practical and effective screening alternative.</p><p><strong>Methods: </strong>In this study, data from the National Health and Nutrition Examination Survey were analyzed to explore factors linked to osteoporosis. After cleaning, 8766 participants with 223 variables were studied. Minimum Redundancy Maximum Relevance and SelectKBest were employed to select the import features. Four Machine learning algorithms (RF, NN, LightGBM and XGBoost.) were applied to examine osteoporosis, with performance comparisons made. Data balancing was done using SMOTE, and metrics like F1 score, and AUC were evaluated for each algorithm.</p><p><strong>Results: </strong>The LightGBM model outperformed others with an F1 score of 0.914, an MCC of 0.831, and an AUC of 0.970 on the training set. On the test set, it achieved an F1 score of 0.912, an MCC of 0.826, and an AUC of 0.972. Top predictors for osteoporosis were height, age, and sex.</p><p><strong>Conclusions: </strong>This study demonstrates the potential of machine learning models in assessing an individual's risk of developing osteoporosis, a condition that significantly impacts quality of life and imposes substantial healthcare costs. The superior performance of the LightGBM model suggests a promising tool for early detection and personalized prevention strategies. Importantly, identifying height, age, and sex as top predictors offers critical insights into the demographic and physiological factors that clinicians should consider when evaluating patients' risk profiles.</p>\",\"PeriodicalId\":9234,\"journal\":{\"name\":\"BMC Research Notes\",\"volume\":\"18 1\",\"pages\":\"108\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-03-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11899459/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Research Notes\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13104-025-07089-3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Research Notes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13104-025-07089-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

目的:骨质疏松症在老年人群中普遍存在,主要通过骨矿物质密度(BMD)检测来诊断,但在早期发现方面存在局限性。本研究旨在通过整合人口统计数据、实验室和问卷调查数据,开发和验证一种骨质疏松症识别的机器学习方法,提供一种更实用、更有效的筛查方法。方法:分析全国健康与营养调查数据,探讨骨质疏松的相关因素。清理后,8766名参与者和223个变量被研究。采用最小冗余、最大关联和SelectKBest选择导入特征。采用四种机器学习算法(RF, NN, LightGBM和XGBoost.)来检查骨质疏松症,并进行性能比较。使用SMOTE完成数据平衡,并评估每个算法的F1分数和AUC等指标。结果:LightGBM模型在训练集上的F1得分为0.914,MCC为0.831,AUC为0.970,优于其他模型。在测试集上,F1得分为0.912,MCC为0.826,AUC为0.972。骨质疏松症的主要预测因子是身高、年龄和性别。结论:本研究证明了机器学习模型在评估个体骨质疏松症风险方面的潜力,骨质疏松症是一种严重影响生活质量并造成大量医疗费用的疾病。LightGBM模型的优越性能为早期检测和个性化预防策略提供了一个有前途的工具。重要的是,确定身高、年龄和性别作为最重要的预测因素,为临床医生在评估患者风险概况时应该考虑的人口统计学和生理因素提供了关键的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
PrOsteoporosis: predicting osteoporosis risk using NHANES data and machine learning approach.

Objectives: Osteoporosis, prevalent among the elderly population, is primarily diagnosed through bone mineral density (BMD) testing, which has limitations in early detection. This study aims to develop and validate a machine learning approach for osteoporosis identification by integrating demographic data, laboratory and questionnaire data, offering a more practical and effective screening alternative.

Methods: In this study, data from the National Health and Nutrition Examination Survey were analyzed to explore factors linked to osteoporosis. After cleaning, 8766 participants with 223 variables were studied. Minimum Redundancy Maximum Relevance and SelectKBest were employed to select the import features. Four Machine learning algorithms (RF, NN, LightGBM and XGBoost.) were applied to examine osteoporosis, with performance comparisons made. Data balancing was done using SMOTE, and metrics like F1 score, and AUC were evaluated for each algorithm.

Results: The LightGBM model outperformed others with an F1 score of 0.914, an MCC of 0.831, and an AUC of 0.970 on the training set. On the test set, it achieved an F1 score of 0.912, an MCC of 0.826, and an AUC of 0.972. Top predictors for osteoporosis were height, age, and sex.

Conclusions: This study demonstrates the potential of machine learning models in assessing an individual's risk of developing osteoporosis, a condition that significantly impacts quality of life and imposes substantial healthcare costs. The superior performance of the LightGBM model suggests a promising tool for early detection and personalized prevention strategies. Importantly, identifying height, age, and sex as top predictors offers critical insights into the demographic and physiological factors that clinicians should consider when evaluating patients' risk profiles.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Research Notes
BMC Research Notes Biochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (all)
CiteScore
3.60
自引率
0.00%
发文量
363
审稿时长
15 weeks
期刊介绍: BMC Research Notes publishes scientifically valid research outputs that cannot be considered as full research or methodology articles. We support the research community across all scientific and clinical disciplines by providing an open access forum for sharing data and useful information; this includes, but is not limited to, updates to previous work, additions to established methods, short publications, null results, research proposals and data management plans.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信