骨质疏松症患病率预测的特征提取方法与机器学习模型的比较分析。

IF 5.7 3区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Medical Systems Pub Date : 2025-05-29 DOI:10.1007/s10916-025-02203-1

Danni Zhang, Xingyu Yang, Fangying Wang, Cifang Qiu, Yanfu Chai, Danruo Fang

{"title":"骨质疏松症患病率预测的特征提取方法与机器学习模型的比较分析。","authors":"Danni Zhang, Xingyu Yang, Fangying Wang, Cifang Qiu, Yanfu Chai, Danruo Fang","doi":"10.1007/s10916-025-02203-1","DOIUrl":null,"url":null,"abstract":"This study systematically examined the impact of three feature selection techniques (Boruta, Extreme gradient boosting (XGBoost), and Lasso) for optimizing four machine learning models (Random forest (RF), XGBoost, Logistic regression (LR), and Support vector machine (SVM)) in predicting bone density prevalence. Our findings revealed that varying data partitioning ratios (training and test sets: 0.6:0.4; 0.7:0.3; 0.8:0.2; 0.9:0.1) minimally impacted the prediction accuracy across all four models, a conclusion reinforced by 10-fold cross validation. Besides, principal component analysis (PCA) led to substantial accuracy degradation (0.6-0.8 range), suggesting incompatibility with this study's requirements due to the inherent complex decision boundaries in the original high-dimensional data. Comparative analysis demonstrated that the Boruta-XGBoost combination achieved superior performance (accuracy: 0.9083 ± 0.0146), significantly outperforming the Lasso-LR combination (0.7480 ± 0.0157) across all evaluation frameworks. Regarding model evaluation metrics, the RF model exhibited enhanced discriminative capacity with Area under the receiver operating characteristic (AUROC) values of 0.85, 0.81, and 0.80 under different feature selection approaches, surpassing the SVM model (0.78, 0.76, and 0.76). This advantage likely stems from RF's native capability to capture non-linear relationships and feature interactions.","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"72"},"PeriodicalIF":5.7000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Analysis of Feature Extraction Methods and Machine Learning Models for Predicting Osteoporosis Prevalence.\",\"authors\":\"Danni Zhang, Xingyu Yang, Fangying Wang, Cifang Qiu, Yanfu Chai, Danruo Fang\",\"doi\":\"10.1007/s10916-025-02203-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study systematically examined the impact of three feature selection techniques (Boruta, Extreme gradient boosting (XGBoost), and Lasso) for optimizing four machine learning models (Random forest (RF), XGBoost, Logistic regression (LR), and Support vector machine (SVM)) in predicting bone density prevalence. Our findings revealed that varying data partitioning ratios (training and test sets: 0.6:0.4; 0.7:0.3; 0.8:0.2; 0.9:0.1) minimally impacted the prediction accuracy across all four models, a conclusion reinforced by 10-fold cross validation. Besides, principal component analysis (PCA) led to substantial accuracy degradation (0.6-0.8 range), suggesting incompatibility with this study's requirements due to the inherent complex decision boundaries in the original high-dimensional data. Comparative analysis demonstrated that the Boruta-XGBoost combination achieved superior performance (accuracy: 0.9083 ± 0.0146), significantly outperforming the Lasso-LR combination (0.7480 ± 0.0157) across all evaluation frameworks. Regarding model evaluation metrics, the RF model exhibited enhanced discriminative capacity with Area under the receiver operating characteristic (AUROC) values of 0.85, 0.81, and 0.80 under different feature selection approaches, surpassing the SVM model (0.78, 0.76, and 0.76). This advantage likely stems from RF's native capability to capture non-linear relationships and feature interactions.\",\"PeriodicalId\":16338,\"journal\":{\"name\":\"Journal of Medical Systems\",\"volume\":\"49 1\",\"pages\":\"72\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Systems\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s10916-025-02203-1\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Systems","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10916-025-02203-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

本研究系统地考察了三种特征选择技术（Boruta、Extreme gradient boosting （XGBoost）和Lasso）在优化四种机器学习模型（随机森林（RF）、XGBoost、Logistic回归（LR）和支持向量机（SVM））预测骨密度流行率方面的影响。我们的研究结果表明，不同的数据划分比率(训练集和测试集：0.6:0.4；0.7:0.3;0.8:0.2;0.9:0.1)对所有四个模型的预测精度影响最小，10倍交叉验证强化了这一结论。此外，由于原始高维数据固有复杂的决策边界，主成分分析（PCA）导致准确率大幅下降（0.6-0.8范围），不符合本研究的要求。对比分析表明，Boruta-XGBoost组合在所有评估框架中均取得了优异的性能（准确率为0.9083±0.0146），显著优于Lasso-LR组合（准确率为0.7480±0.0157）。在模型评价指标方面，不同特征选择方法下，射频模型的AUROC值分别为0.85、0.81和0.80，优于支持向量机模型（0.78、0.76和0.76）。这种优势可能源于RF捕捉非线性关系和特征交互的原生能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparative Analysis of Feature Extraction Methods and Machine Learning Models for Predicting Osteoporosis Prevalence.

This study systematically examined the impact of three feature selection techniques (Boruta, Extreme gradient boosting (XGBoost), and Lasso) for optimizing four machine learning models (Random forest (RF), XGBoost, Logistic regression (LR), and Support vector machine (SVM)) in predicting bone density prevalence. Our findings revealed that varying data partitioning ratios (training and test sets: 0.6:0.4; 0.7:0.3; 0.8:0.2; 0.9:0.1) minimally impacted the prediction accuracy across all four models, a conclusion reinforced by 10-fold cross validation. Besides, principal component analysis (PCA) led to substantial accuracy degradation (0.6-0.8 range), suggesting incompatibility with this study's requirements due to the inherent complex decision boundaries in the original high-dimensional data. Comparative analysis demonstrated that the Boruta-XGBoost combination achieved superior performance (accuracy: 0.9083 ± 0.0146), significantly outperforming the Lasso-LR combination (0.7480 ± 0.0157) across all evaluation frameworks. Regarding model evaluation metrics, the RF model exhibited enhanced discriminative capacity with Area under the receiver operating characteristic (AUROC) values of 0.85, 0.81, and 0.80 under different feature selection approaches, surpassing the SVM model (0.78, 0.76, and 0.76). This advantage likely stems from RF's native capability to capture non-linear relationships and feature interactions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Medical Systems 医学-卫生保健

CiteScore

11.60

自引率

1.90%

发文量

审稿时长

4.8 months

期刊介绍： Journal of Medical Systems provides a forum for the presentation and discussion of the increasingly extensive applications of new systems techniques and methods in hospital clinic and physician''s office administration; pathology radiology and pharmaceutical delivery systems; medical records storage and retrieval; and ancillary patient-support systems. The journal publishes informative articles essays and studies across the entire scale of medical systems from large hospital programs to novel small-scale medical services. Education is an integral part of this amalgamation of sciences and selected articles are published in this area. Since existing medical systems are constantly being modified to fit particular circumstances and to solve specific problems the journal includes a special section devoted to status reports on current installations.