利用电子健康记录开发和验证针对女性髋关节骨质疏松症的可解释机器学习模型

IF 3.7 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Wanlin Jin , Lulu Xu , Chun Yue , Li Hu , Yuzhou Wang , Yaqian Fu , Yuanwei Guo , Fan Bai , Yanyi Yang , Xianmei Zhao , Yingquan Luo , Xiyu Wu , Zhifeng Sheng
{"title":"利用电子健康记录开发和验证针对女性髋关节骨质疏松症的可解释机器学习模型","authors":"Wanlin Jin ,&nbsp;Lulu Xu ,&nbsp;Chun Yue ,&nbsp;Li Hu ,&nbsp;Yuzhou Wang ,&nbsp;Yaqian Fu ,&nbsp;Yuanwei Guo ,&nbsp;Fan Bai ,&nbsp;Yanyi Yang ,&nbsp;Xianmei Zhao ,&nbsp;Yingquan Luo ,&nbsp;Xiyu Wu ,&nbsp;Zhifeng Sheng","doi":"10.1016/j.ijmedinf.2025.105889","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Hip fractures are associated with reduced mobility, and higher morbidity, mortality, and healthcare costs. Approximately 90% of hip fractures in the elderly are associated with osteoporosis, making it particularly important to screen the population for hip osteoporosis and intervene early. Dual-energy X-ray absorptiometry (DXA) has limited accessibility, so predictive models for hip osteoporosis that do not use bone mineral density (BMD) data are essential. We aimed to develop and validate prediction models for female hip osteoporosis using electronic health records without BMD data.</div></div><div><h3>Methods</h3><div>This retrospective study used anonymized medical electronic records, from September 2013 to November 2023, from the Health Management Center of the Second Xiangya Hospital. A total of 8039 women were included in the derivation dataset. The set was then randomized into a 75% training dataset and a 25% testing dataset. Four algorithms for feature selection were used to identify predictors of osteoporosis. The identified predictors were then used to train and optimize eight machine learning models. The models were tuned using 5-fold cross-validation to assess model performance in the testing dataset and the independent validation dataset from the National Health and Nutrition Examination Surveys (NHANES). The SHapley Additive explanation (SHAP) method was used to rank feature importance and explain the final model.</div></div><div><h3>Results</h3><div>A combination of the Boruta, LASSO, varSelRF, and RFE methods identified systolic blood pressure, red blood cell count, glycohemoglobin, alanine aminotransferase, aspartate aminotransferase, uric acid, age, and body mass index as the most important predictors of osteoporosis in women. The XGBoost model outperformed the other models, with an Area Under the Curve (AUC) of 0.805 (95%CI: 0.779–0.831), and a moderate sensitivity of 0.706. The externally validated XGBoost model had an AUC of 0.811 (95% CI: 0.793–0.828), with a moderate sensitivity of 0.775.</div></div><div><h3>Conclusions</h3><div>The XGBoost model demonstrates high identification performance even without questionnaire data, out-performing both the traditional the logistic regression model and the OSTA model. It can be integrated into routine clinical workflows to identify females at high risk for osteoporosis.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"199 ","pages":"Article 105889"},"PeriodicalIF":3.7000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development and validation of explainable machine learning models for female hip osteoporosis using electronic health records\",\"authors\":\"Wanlin Jin ,&nbsp;Lulu Xu ,&nbsp;Chun Yue ,&nbsp;Li Hu ,&nbsp;Yuzhou Wang ,&nbsp;Yaqian Fu ,&nbsp;Yuanwei Guo ,&nbsp;Fan Bai ,&nbsp;Yanyi Yang ,&nbsp;Xianmei Zhao ,&nbsp;Yingquan Luo ,&nbsp;Xiyu Wu ,&nbsp;Zhifeng Sheng\",\"doi\":\"10.1016/j.ijmedinf.2025.105889\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Hip fractures are associated with reduced mobility, and higher morbidity, mortality, and healthcare costs. Approximately 90% of hip fractures in the elderly are associated with osteoporosis, making it particularly important to screen the population for hip osteoporosis and intervene early. Dual-energy X-ray absorptiometry (DXA) has limited accessibility, so predictive models for hip osteoporosis that do not use bone mineral density (BMD) data are essential. We aimed to develop and validate prediction models for female hip osteoporosis using electronic health records without BMD data.</div></div><div><h3>Methods</h3><div>This retrospective study used anonymized medical electronic records, from September 2013 to November 2023, from the Health Management Center of the Second Xiangya Hospital. A total of 8039 women were included in the derivation dataset. The set was then randomized into a 75% training dataset and a 25% testing dataset. Four algorithms for feature selection were used to identify predictors of osteoporosis. The identified predictors were then used to train and optimize eight machine learning models. The models were tuned using 5-fold cross-validation to assess model performance in the testing dataset and the independent validation dataset from the National Health and Nutrition Examination Surveys (NHANES). The SHapley Additive explanation (SHAP) method was used to rank feature importance and explain the final model.</div></div><div><h3>Results</h3><div>A combination of the Boruta, LASSO, varSelRF, and RFE methods identified systolic blood pressure, red blood cell count, glycohemoglobin, alanine aminotransferase, aspartate aminotransferase, uric acid, age, and body mass index as the most important predictors of osteoporosis in women. The XGBoost model outperformed the other models, with an Area Under the Curve (AUC) of 0.805 (95%CI: 0.779–0.831), and a moderate sensitivity of 0.706. The externally validated XGBoost model had an AUC of 0.811 (95% CI: 0.793–0.828), with a moderate sensitivity of 0.775.</div></div><div><h3>Conclusions</h3><div>The XGBoost model demonstrates high identification performance even without questionnaire data, out-performing both the traditional the logistic regression model and the OSTA model. It can be integrated into routine clinical workflows to identify females at high risk for osteoporosis.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"199 \",\"pages\":\"Article 105889\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505625001066\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625001066","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景髋部骨折与活动能力下降、发病率、死亡率和医疗成本增加有关。约 90% 的老年人髋部骨折与骨质疏松症有关,因此对人群进行髋部骨质疏松症筛查和早期干预尤为重要。双能 X 射线吸收测定法(DXA)的可及性有限,因此不使用骨矿物质密度(BMD)数据的髋关节骨质疏松症预测模型至关重要。这项回顾性研究使用了湘雅二医院健康管理中心 2013 年 9 月至 2023 年 11 月的匿名医疗电子记录。共有 8039 名女性被纳入衍生数据集。然后将数据集随机分为 75% 的训练数据集和 25% 的测试数据集。使用四种特征选择算法来识别骨质疏松症的预测因子。确定的预测因子随后被用于训练和优化八个机器学习模型。使用 5 倍交叉验证对模型进行调整,以评估模型在测试数据集和来自美国国家健康与营养调查(NHANES)的独立验证数据集中的性能。结果Boruta、LASSO、varSelRF 和 RFE 方法的组合确定了收缩压、红细胞计数、糖化血红蛋白、丙氨酸氨基转移酶、天冬氨酸氨基转移酶、尿酸、年龄和体重指数是女性骨质疏松症最重要的预测指标。XGBoost 模型的曲线下面积 (AUC) 为 0.805(95%CI:0.779-0.831),灵敏度为 0.706,表现优于其他模型。经外部验证的 XGBoost 模型的 AUC 为 0.811(95%CI:0.793-0.828),灵敏度为 0.775,处于中等水平。该模型可集成到常规临床工作流程中,用于识别骨质疏松症高风险女性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Development and validation of explainable machine learning models for female hip osteoporosis using electronic health records

Background

Hip fractures are associated with reduced mobility, and higher morbidity, mortality, and healthcare costs. Approximately 90% of hip fractures in the elderly are associated with osteoporosis, making it particularly important to screen the population for hip osteoporosis and intervene early. Dual-energy X-ray absorptiometry (DXA) has limited accessibility, so predictive models for hip osteoporosis that do not use bone mineral density (BMD) data are essential. We aimed to develop and validate prediction models for female hip osteoporosis using electronic health records without BMD data.

Methods

This retrospective study used anonymized medical electronic records, from September 2013 to November 2023, from the Health Management Center of the Second Xiangya Hospital. A total of 8039 women were included in the derivation dataset. The set was then randomized into a 75% training dataset and a 25% testing dataset. Four algorithms for feature selection were used to identify predictors of osteoporosis. The identified predictors were then used to train and optimize eight machine learning models. The models were tuned using 5-fold cross-validation to assess model performance in the testing dataset and the independent validation dataset from the National Health and Nutrition Examination Surveys (NHANES). The SHapley Additive explanation (SHAP) method was used to rank feature importance and explain the final model.

Results

A combination of the Boruta, LASSO, varSelRF, and RFE methods identified systolic blood pressure, red blood cell count, glycohemoglobin, alanine aminotransferase, aspartate aminotransferase, uric acid, age, and body mass index as the most important predictors of osteoporosis in women. The XGBoost model outperformed the other models, with an Area Under the Curve (AUC) of 0.805 (95%CI: 0.779–0.831), and a moderate sensitivity of 0.706. The externally validated XGBoost model had an AUC of 0.811 (95% CI: 0.793–0.828), with a moderate sensitivity of 0.775.

Conclusions

The XGBoost model demonstrates high identification performance even without questionnaire data, out-performing both the traditional the logistic regression model and the OSTA model. It can be integrated into routine clinical workflows to identify females at high risk for osteoporosis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Medical Informatics
International Journal of Medical Informatics 医学-计算机:信息系统
CiteScore
8.90
自引率
4.10%
发文量
217
审稿时长
42 days
期刊介绍: International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信