Wanlin Jin , Lulu Xu , Chun Yue , Li Hu , Yuzhou Wang , Yaqian Fu , Yuanwei Guo , Fan Bai , Yanyi Yang , Xianmei Zhao , Yingquan Luo , Xiyu Wu , Zhifeng Sheng
{"title":"利用电子健康记录开发和验证针对女性髋关节骨质疏松症的可解释机器学习模型","authors":"Wanlin Jin , Lulu Xu , Chun Yue , Li Hu , Yuzhou Wang , Yaqian Fu , Yuanwei Guo , Fan Bai , Yanyi Yang , Xianmei Zhao , Yingquan Luo , Xiyu Wu , Zhifeng Sheng","doi":"10.1016/j.ijmedinf.2025.105889","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Hip fractures are associated with reduced mobility, and higher morbidity, mortality, and healthcare costs. Approximately 90% of hip fractures in the elderly are associated with osteoporosis, making it particularly important to screen the population for hip osteoporosis and intervene early. Dual-energy X-ray absorptiometry (DXA) has limited accessibility, so predictive models for hip osteoporosis that do not use bone mineral density (BMD) data are essential. We aimed to develop and validate prediction models for female hip osteoporosis using electronic health records without BMD data.</div></div><div><h3>Methods</h3><div>This retrospective study used anonymized medical electronic records, from September 2013 to November 2023, from the Health Management Center of the Second Xiangya Hospital. A total of 8039 women were included in the derivation dataset. The set was then randomized into a 75% training dataset and a 25% testing dataset. Four algorithms for feature selection were used to identify predictors of osteoporosis. The identified predictors were then used to train and optimize eight machine learning models. The models were tuned using 5-fold cross-validation to assess model performance in the testing dataset and the independent validation dataset from the National Health and Nutrition Examination Surveys (NHANES). The SHapley Additive explanation (SHAP) method was used to rank feature importance and explain the final model.</div></div><div><h3>Results</h3><div>A combination of the Boruta, LASSO, varSelRF, and RFE methods identified systolic blood pressure, red blood cell count, glycohemoglobin, alanine aminotransferase, aspartate aminotransferase, uric acid, age, and body mass index as the most important predictors of osteoporosis in women. The XGBoost model outperformed the other models, with an Area Under the Curve (AUC) of 0.805 (95%CI: 0.779–0.831), and a moderate sensitivity of 0.706. The externally validated XGBoost model had an AUC of 0.811 (95% CI: 0.793–0.828), with a moderate sensitivity of 0.775.</div></div><div><h3>Conclusions</h3><div>The XGBoost model demonstrates high identification performance even without questionnaire data, out-performing both the traditional the logistic regression model and the OSTA model. It can be integrated into routine clinical workflows to identify females at high risk for osteoporosis.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"199 ","pages":"Article 105889"},"PeriodicalIF":3.7000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development and validation of explainable machine learning models for female hip osteoporosis using electronic health records\",\"authors\":\"Wanlin Jin , Lulu Xu , Chun Yue , Li Hu , Yuzhou Wang , Yaqian Fu , Yuanwei Guo , Fan Bai , Yanyi Yang , Xianmei Zhao , Yingquan Luo , Xiyu Wu , Zhifeng Sheng\",\"doi\":\"10.1016/j.ijmedinf.2025.105889\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Hip fractures are associated with reduced mobility, and higher morbidity, mortality, and healthcare costs. Approximately 90% of hip fractures in the elderly are associated with osteoporosis, making it particularly important to screen the population for hip osteoporosis and intervene early. Dual-energy X-ray absorptiometry (DXA) has limited accessibility, so predictive models for hip osteoporosis that do not use bone mineral density (BMD) data are essential. We aimed to develop and validate prediction models for female hip osteoporosis using electronic health records without BMD data.</div></div><div><h3>Methods</h3><div>This retrospective study used anonymized medical electronic records, from September 2013 to November 2023, from the Health Management Center of the Second Xiangya Hospital. A total of 8039 women were included in the derivation dataset. The set was then randomized into a 75% training dataset and a 25% testing dataset. Four algorithms for feature selection were used to identify predictors of osteoporosis. The identified predictors were then used to train and optimize eight machine learning models. The models were tuned using 5-fold cross-validation to assess model performance in the testing dataset and the independent validation dataset from the National Health and Nutrition Examination Surveys (NHANES). The SHapley Additive explanation (SHAP) method was used to rank feature importance and explain the final model.</div></div><div><h3>Results</h3><div>A combination of the Boruta, LASSO, varSelRF, and RFE methods identified systolic blood pressure, red blood cell count, glycohemoglobin, alanine aminotransferase, aspartate aminotransferase, uric acid, age, and body mass index as the most important predictors of osteoporosis in women. The XGBoost model outperformed the other models, with an Area Under the Curve (AUC) of 0.805 (95%CI: 0.779–0.831), and a moderate sensitivity of 0.706. The externally validated XGBoost model had an AUC of 0.811 (95% CI: 0.793–0.828), with a moderate sensitivity of 0.775.</div></div><div><h3>Conclusions</h3><div>The XGBoost model demonstrates high identification performance even without questionnaire data, out-performing both the traditional the logistic regression model and the OSTA model. It can be integrated into routine clinical workflows to identify females at high risk for osteoporosis.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"199 \",\"pages\":\"Article 105889\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505625001066\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625001066","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Development and validation of explainable machine learning models for female hip osteoporosis using electronic health records
Background
Hip fractures are associated with reduced mobility, and higher morbidity, mortality, and healthcare costs. Approximately 90% of hip fractures in the elderly are associated with osteoporosis, making it particularly important to screen the population for hip osteoporosis and intervene early. Dual-energy X-ray absorptiometry (DXA) has limited accessibility, so predictive models for hip osteoporosis that do not use bone mineral density (BMD) data are essential. We aimed to develop and validate prediction models for female hip osteoporosis using electronic health records without BMD data.
Methods
This retrospective study used anonymized medical electronic records, from September 2013 to November 2023, from the Health Management Center of the Second Xiangya Hospital. A total of 8039 women were included in the derivation dataset. The set was then randomized into a 75% training dataset and a 25% testing dataset. Four algorithms for feature selection were used to identify predictors of osteoporosis. The identified predictors were then used to train and optimize eight machine learning models. The models were tuned using 5-fold cross-validation to assess model performance in the testing dataset and the independent validation dataset from the National Health and Nutrition Examination Surveys (NHANES). The SHapley Additive explanation (SHAP) method was used to rank feature importance and explain the final model.
Results
A combination of the Boruta, LASSO, varSelRF, and RFE methods identified systolic blood pressure, red blood cell count, glycohemoglobin, alanine aminotransferase, aspartate aminotransferase, uric acid, age, and body mass index as the most important predictors of osteoporosis in women. The XGBoost model outperformed the other models, with an Area Under the Curve (AUC) of 0.805 (95%CI: 0.779–0.831), and a moderate sensitivity of 0.706. The externally validated XGBoost model had an AUC of 0.811 (95% CI: 0.793–0.828), with a moderate sensitivity of 0.775.
Conclusions
The XGBoost model demonstrates high identification performance even without questionnaire data, out-performing both the traditional the logistic regression model and the OSTA model. It can be integrated into routine clinical workflows to identify females at high risk for osteoporosis.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.