基于xgboost - shape的膝关节骨关节炎可解释诊断框架:一项基于人群的回顾性队列研究

IF 4.9 2区 医学 Q1 Medicine
Zijuan Fan, Wenzhu Song, Yan Ke, Ligan Jia, Songyan Li, Jiao Jiao Li, Yuqing Zhang, Jianhao Lin, Bin Wang
{"title":"基于xgboost - shape的膝关节骨关节炎可解释诊断框架:一项基于人群的回顾性队列研究","authors":"Zijuan Fan, Wenzhu Song, Yan Ke, Ligan Jia, Songyan Li, Jiao Jiao Li, Yuqing Zhang, Jianhao Lin, Bin Wang","doi":"10.1186/s13075-024-03450-2","DOIUrl":null,"url":null,"abstract":"To use routine demographic and clinical data to develop an interpretable individual-level machine learning (ML) model to diagnose knee osteoarthritis (KOA) and to identify highly ranked features. In this retrospective, population-based cohort study, anonymized questionnaire data was retrieved from the Wu Chuan KOA Study, Inner Mongolia, China. After feature selections, participants were divided in a 7:3 ratio into training and test sets. Class balancing was applied to the training set for data augmentation. Four ML classifiers were compared by cross-validation within the training set and their performance was further analyzed with an unseen test set. Classifications were evaluated using sensitivity, specificity, positive predictive value, negative predictive value, accuracy, area under the curve(AUC), G-means, and F1 scores. The best model was explained using Shapley values to extract highly ranked features. A total of 1188 participants were investigated in this study, among whom 26.3% were diagnosed with KOA. Comparatively, XGBoost with Boruta exhibited the highest classification performance among the four models, with an AUC of 0.758, G-means of 0.800, and F1 scores of 0.703. The SHAP method reveals the top 17 features of KOA according to the importance ranking, and the average of the experience of joint pain was recognized as the most important features. Our study highlights the usefulness of machine learning in unveiling important factors that influence the diagnosis of KOA to guide new prevention strategies. Further work is needed to validate this approach.","PeriodicalId":8419,"journal":{"name":"Arthritis Research & Therapy","volume":"77 1","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"XGBoost-SHAP-based interpretable diagnostic framework for knee osteoarthritis: a population-based retrospective cohort study\",\"authors\":\"Zijuan Fan, Wenzhu Song, Yan Ke, Ligan Jia, Songyan Li, Jiao Jiao Li, Yuqing Zhang, Jianhao Lin, Bin Wang\",\"doi\":\"10.1186/s13075-024-03450-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To use routine demographic and clinical data to develop an interpretable individual-level machine learning (ML) model to diagnose knee osteoarthritis (KOA) and to identify highly ranked features. In this retrospective, population-based cohort study, anonymized questionnaire data was retrieved from the Wu Chuan KOA Study, Inner Mongolia, China. After feature selections, participants were divided in a 7:3 ratio into training and test sets. Class balancing was applied to the training set for data augmentation. Four ML classifiers were compared by cross-validation within the training set and their performance was further analyzed with an unseen test set. Classifications were evaluated using sensitivity, specificity, positive predictive value, negative predictive value, accuracy, area under the curve(AUC), G-means, and F1 scores. The best model was explained using Shapley values to extract highly ranked features. A total of 1188 participants were investigated in this study, among whom 26.3% were diagnosed with KOA. Comparatively, XGBoost with Boruta exhibited the highest classification performance among the four models, with an AUC of 0.758, G-means of 0.800, and F1 scores of 0.703. The SHAP method reveals the top 17 features of KOA according to the importance ranking, and the average of the experience of joint pain was recognized as the most important features. Our study highlights the usefulness of machine learning in unveiling important factors that influence the diagnosis of KOA to guide new prevention strategies. Further work is needed to validate this approach.\",\"PeriodicalId\":8419,\"journal\":{\"name\":\"Arthritis Research & Therapy\",\"volume\":\"77 1\",\"pages\":\"\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2024-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Arthritis Research & Therapy\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13075-024-03450-2\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthritis Research & Therapy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13075-024-03450-2","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

利用常规人口统计学和临床数据开发一种可解释的个体级机器学习(ML)模型,以诊断膝骨关节炎(KOA)并识别高排序特征。在这项基于人群的回顾性队列研究中,我们从中国内蒙古吴川市的 KOA 研究中获取了匿名问卷数据。特征选择后,参与者按 7:3 的比例被分为训练集和测试集。对训练集进行类平衡,以增加数据量。在训练集中对四种 ML 分类器进行了交叉验证比较,并通过未见测试集进一步分析了它们的性能。使用灵敏度、特异性、阳性预测值、阴性预测值、准确度、曲线下面积(AUC)、G-means 和 F1 分数对分类进行评估。最佳模型使用 Shapley 值进行解释,以提取排名靠前的特征。本研究共调查了 1188 名参与者,其中 26.3% 被诊断为 KOA。相比之下,带有 Boruta 的 XGBoost 在四种模型中表现出最高的分类性能,AUC 为 0.758,G-means 为 0.800,F1 分数为 0.703。SHAP 方法根据重要性排序揭示了 KOA 的前 17 个特征,其中关节疼痛体验的平均值被认为是最重要的特征。我们的研究强调了机器学习在揭示影响 KOA 诊断的重要因素以指导新的预防策略方面的作用。还需要进一步的工作来验证这种方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
XGBoost-SHAP-based interpretable diagnostic framework for knee osteoarthritis: a population-based retrospective cohort study
To use routine demographic and clinical data to develop an interpretable individual-level machine learning (ML) model to diagnose knee osteoarthritis (KOA) and to identify highly ranked features. In this retrospective, population-based cohort study, anonymized questionnaire data was retrieved from the Wu Chuan KOA Study, Inner Mongolia, China. After feature selections, participants were divided in a 7:3 ratio into training and test sets. Class balancing was applied to the training set for data augmentation. Four ML classifiers were compared by cross-validation within the training set and their performance was further analyzed with an unseen test set. Classifications were evaluated using sensitivity, specificity, positive predictive value, negative predictive value, accuracy, area under the curve(AUC), G-means, and F1 scores. The best model was explained using Shapley values to extract highly ranked features. A total of 1188 participants were investigated in this study, among whom 26.3% were diagnosed with KOA. Comparatively, XGBoost with Boruta exhibited the highest classification performance among the four models, with an AUC of 0.758, G-means of 0.800, and F1 scores of 0.703. The SHAP method reveals the top 17 features of KOA according to the importance ranking, and the average of the experience of joint pain was recognized as the most important features. Our study highlights the usefulness of machine learning in unveiling important factors that influence the diagnosis of KOA to guide new prevention strategies. Further work is needed to validate this approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.60
自引率
2.00%
发文量
261
审稿时长
14 weeks
期刊介绍: Established in 1999, Arthritis Research and Therapy is an international, open access, peer-reviewed journal, publishing original articles in the area of musculoskeletal research and therapy as well as, reviews, commentaries and reports. A major focus of the journal is on the immunologic processes leading to inflammation, damage and repair as they relate to autoimmune rheumatic and musculoskeletal conditions, and which inform the translation of this knowledge into advances in clinical care. Original basic, translational and clinical research is considered for publication along with results of early and late phase therapeutic trials, especially as they pertain to the underpinning science that informs clinical observations in interventional studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信