{"title":"PrOsteoporosis: predicting osteoporosis risk using NHANES data and machine learning approach.","authors":"Zebing Si, Di Zhang, Huajun Wang, Xiaofei Zheng","doi":"10.1186/s13104-025-07089-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Osteoporosis, prevalent among the elderly population, is primarily diagnosed through bone mineral density (BMD) testing, which has limitations in early detection. This study aims to develop and validate a machine learning approach for osteoporosis identification by integrating demographic data, laboratory and questionnaire data, offering a more practical and effective screening alternative.</p><p><strong>Methods: </strong>In this study, data from the National Health and Nutrition Examination Survey were analyzed to explore factors linked to osteoporosis. After cleaning, 8766 participants with 223 variables were studied. Minimum Redundancy Maximum Relevance and SelectKBest were employed to select the import features. Four Machine learning algorithms (RF, NN, LightGBM and XGBoost.) were applied to examine osteoporosis, with performance comparisons made. Data balancing was done using SMOTE, and metrics like F1 score, and AUC were evaluated for each algorithm.</p><p><strong>Results: </strong>The LightGBM model outperformed others with an F1 score of 0.914, an MCC of 0.831, and an AUC of 0.970 on the training set. On the test set, it achieved an F1 score of 0.912, an MCC of 0.826, and an AUC of 0.972. Top predictors for osteoporosis were height, age, and sex.</p><p><strong>Conclusions: </strong>This study demonstrates the potential of machine learning models in assessing an individual's risk of developing osteoporosis, a condition that significantly impacts quality of life and imposes substantial healthcare costs. The superior performance of the LightGBM model suggests a promising tool for early detection and personalized prevention strategies. Importantly, identifying height, age, and sex as top predictors offers critical insights into the demographic and physiological factors that clinicians should consider when evaluating patients' risk profiles.</p>","PeriodicalId":9234,"journal":{"name":"BMC Research Notes","volume":"18 1","pages":"108"},"PeriodicalIF":1.6000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11899459/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Research Notes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13104-025-07089-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: Osteoporosis, prevalent among the elderly population, is primarily diagnosed through bone mineral density (BMD) testing, which has limitations in early detection. This study aims to develop and validate a machine learning approach for osteoporosis identification by integrating demographic data, laboratory and questionnaire data, offering a more practical and effective screening alternative.
Methods: In this study, data from the National Health and Nutrition Examination Survey were analyzed to explore factors linked to osteoporosis. After cleaning, 8766 participants with 223 variables were studied. Minimum Redundancy Maximum Relevance and SelectKBest were employed to select the import features. Four Machine learning algorithms (RF, NN, LightGBM and XGBoost.) were applied to examine osteoporosis, with performance comparisons made. Data balancing was done using SMOTE, and metrics like F1 score, and AUC were evaluated for each algorithm.
Results: The LightGBM model outperformed others with an F1 score of 0.914, an MCC of 0.831, and an AUC of 0.970 on the training set. On the test set, it achieved an F1 score of 0.912, an MCC of 0.826, and an AUC of 0.972. Top predictors for osteoporosis were height, age, and sex.
Conclusions: This study demonstrates the potential of machine learning models in assessing an individual's risk of developing osteoporosis, a condition that significantly impacts quality of life and imposes substantial healthcare costs. The superior performance of the LightGBM model suggests a promising tool for early detection and personalized prevention strategies. Importantly, identifying height, age, and sex as top predictors offers critical insights into the demographic and physiological factors that clinicians should consider when evaluating patients' risk profiles.
BMC Research NotesBiochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (all)
CiteScore
3.60
自引率
0.00%
发文量
363
审稿时长
15 weeks
期刊介绍:
BMC Research Notes publishes scientifically valid research outputs that cannot be considered as full research or methodology articles. We support the research community across all scientific and clinical disciplines by providing an open access forum for sharing data and useful information; this includes, but is not limited to, updates to previous work, additions to established methods, short publications, null results, research proposals and data management plans.