可解释的机器学习识别关节炎状态的关键生活质量相关预测因素:来自中国健康和退休纵向研究的证据。

IF 3.4 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Kaibin Lin, Tingting Jiang, Jiafen Liao, Xianrun Zhou, Zheng Wang, Yiyue Chen, Xi Xu, Bing Zhou
{"title":"可解释的机器学习识别关节炎状态的关键生活质量相关预测因素:来自中国健康和退休纵向研究的证据。","authors":"Kaibin Lin, Tingting Jiang, Jiafen Liao, Xianrun Zhou, Zheng Wang, Yiyue Chen, Xi Xu, Bing Zhou","doi":"10.1186/s12955-025-02412-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Arthritis is a prevalent chronic disease substantially impacting patients' quality of life (QoL). While identifying key determinants associated with arthritis is critical for targeted interventions, traditional statistical methods often struggle with complex interactions, and existing machine learning (ML) approaches frequently lack the interpretability needed to guide clinical decisions. This study integrates a comprehensive, explainable machine learning (XAI) workflow to identify and interpret key QoL-related predictors of arthritis status in a large national cohort.</p><p><strong>Methods: </strong>Data were obtained from 15,011 participants aged > 45 years in the 2020 China Health and Retirement Longitudinal Study (CHARLS). We initially selected 55 potential QoL-related predictors spanning demographic, functional, pain, psychosocial, and lifestyle domains. Feature engineering was performed to create aggregate scores, indicators, and binned variables. Missing data were handled using imputation combined with missing indicator variables. A LightGBM-based feature selection process identified 68 key predictors. Nine ML models (including Logistic Regression, RandomForest, GradientBoosting, LightGBM, CatBoost, XGBoost, DecisionTree, NaiveBayes, KNN) were developed using SMOTE-resampled training data, with hyperparameters optimized via Optuna targeting recall. Performance was evaluated on a held-out test set using Area Under the ROC Curve (AUC), Average Precision (AP), Recall, Specificity, Precison, and F1-score. SHapley Additive exPlanations (SHAP) analysis was applied to the best-performing model (GradientBoosting) for interpretation.</p><p><strong>Results: </strong>Several models achieved strong predictive performance, with GradientBoosting yielding the highest AUC (0.767, 95% CI: 0.752-0.782) and high AP (0.678, 95% CI: 0.655-0.702). SHAP analysis identified multi-site pain burden (particularly knee/leg pain and pain location count), age, self-rated health, sleep quality, functional limitations (ADL counts/scores), and negative affect as the most influential predictors driving arthritis status prediction.</p><p><strong>Conclusions: </strong>This study successfully applied an XAI pipeline to identify and rank key QoL-related factors predictive of arthritis status in a large Chinese cohort, achieving robust model performance. Pain burden, age, subjective health, sleep, functional status, and psychological well-being are critical determinants. These interpretable findings can inform risk stratification and guide targeted interventions focusing on these key areas to potentially improve arthritis management.</p>","PeriodicalId":12980,"journal":{"name":"Health and Quality of Life Outcomes","volume":"23 1","pages":"80"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12381994/pdf/","citationCount":"0","resultStr":"{\"title\":\"Explainable machine learning identifies key quality-of-life-related predictors of arthritis status: evidence from the China health and retirement longitudinal study.\",\"authors\":\"Kaibin Lin, Tingting Jiang, Jiafen Liao, Xianrun Zhou, Zheng Wang, Yiyue Chen, Xi Xu, Bing Zhou\",\"doi\":\"10.1186/s12955-025-02412-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Arthritis is a prevalent chronic disease substantially impacting patients' quality of life (QoL). While identifying key determinants associated with arthritis is critical for targeted interventions, traditional statistical methods often struggle with complex interactions, and existing machine learning (ML) approaches frequently lack the interpretability needed to guide clinical decisions. This study integrates a comprehensive, explainable machine learning (XAI) workflow to identify and interpret key QoL-related predictors of arthritis status in a large national cohort.</p><p><strong>Methods: </strong>Data were obtained from 15,011 participants aged > 45 years in the 2020 China Health and Retirement Longitudinal Study (CHARLS). We initially selected 55 potential QoL-related predictors spanning demographic, functional, pain, psychosocial, and lifestyle domains. Feature engineering was performed to create aggregate scores, indicators, and binned variables. Missing data were handled using imputation combined with missing indicator variables. A LightGBM-based feature selection process identified 68 key predictors. Nine ML models (including Logistic Regression, RandomForest, GradientBoosting, LightGBM, CatBoost, XGBoost, DecisionTree, NaiveBayes, KNN) were developed using SMOTE-resampled training data, with hyperparameters optimized via Optuna targeting recall. Performance was evaluated on a held-out test set using Area Under the ROC Curve (AUC), Average Precision (AP), Recall, Specificity, Precison, and F1-score. SHapley Additive exPlanations (SHAP) analysis was applied to the best-performing model (GradientBoosting) for interpretation.</p><p><strong>Results: </strong>Several models achieved strong predictive performance, with GradientBoosting yielding the highest AUC (0.767, 95% CI: 0.752-0.782) and high AP (0.678, 95% CI: 0.655-0.702). SHAP analysis identified multi-site pain burden (particularly knee/leg pain and pain location count), age, self-rated health, sleep quality, functional limitations (ADL counts/scores), and negative affect as the most influential predictors driving arthritis status prediction.</p><p><strong>Conclusions: </strong>This study successfully applied an XAI pipeline to identify and rank key QoL-related factors predictive of arthritis status in a large Chinese cohort, achieving robust model performance. Pain burden, age, subjective health, sleep, functional status, and psychological well-being are critical determinants. These interpretable findings can inform risk stratification and guide targeted interventions focusing on these key areas to potentially improve arthritis management.</p>\",\"PeriodicalId\":12980,\"journal\":{\"name\":\"Health and Quality of Life Outcomes\",\"volume\":\"23 1\",\"pages\":\"80\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12381994/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health and Quality of Life Outcomes\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12955-025-02412-9\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health and Quality of Life Outcomes","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12955-025-02412-9","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:关节炎是一种普遍存在的慢性疾病,严重影响患者的生活质量。虽然确定与关节炎相关的关键决定因素对于有针对性的干预至关重要,但传统的统计方法往往难以处理复杂的相互作用,而现有的机器学习(ML)方法往往缺乏指导临床决策所需的可解释性。本研究整合了一个全面的、可解释的机器学习(XAI)工作流程,以识别和解释大型国家队列中关节炎状态的关键生活质量相关预测因子。方法:数据来自2020年中国健康与退休纵向研究(CHARLS)中的15011名年龄在bb0 - 45岁之间的参与者。我们最初选择了55个潜在的与生活质量相关的预测因子,涵盖人口统计学、功能、疼痛、社会心理和生活方式等领域。特征工程被用来创建综合分数、指示器和分类变量。用缺失指标变量与缺失指标变量相结合的方法处理缺失数据。基于lightgbm的特征选择过程确定了68个关键预测因子。使用smote重采样的训练数据开发了9个ML模型(包括Logistic Regression, RandomForest, GradientBoosting, LightGBM, CatBoost, XGBoost, DecisionTree, NaiveBayes, KNN),并通过Optuna对超参数进行了优化。使用ROC曲线下面积(AUC)、平均精密度(AP)、召回率(Recall)、特异性(Specificity)、精密度(Precison)和f1评分对测试集的性能进行评估。SHapley加性解释(SHAP)分析应用于表现最好的模型(GradientBoosting)进行解释。结果:几个模型获得了很强的预测性能,其中GradientBoosting产生最高的AUC (0.767, 95% CI: 0.752-0.782)和高AP (0.678, 95% CI: 0.655-0.702)。SHAP分析确定了多部位疼痛负担(特别是膝关节/腿部疼痛和疼痛部位计数)、年龄、自评健康、睡眠质量、功能限制(ADL计数/评分)和负面影响是驱动关节炎状态预测的最具影响力的预测因素。结论:本研究成功地应用了XAI管道,在一个大型中国队列中识别和排序预测关节炎状态的关键生活质量相关因素,实现了稳健的模型性能。疼痛负担、年龄、主观健康、睡眠、功能状态和心理健康是关键的决定因素。这些可解释的发现可以为风险分层提供信息,并指导针对这些关键领域的有针对性的干预措施,以潜在地改善关节炎管理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Explainable machine learning identifies key quality-of-life-related predictors of arthritis status: evidence from the China health and retirement longitudinal study.

Explainable machine learning identifies key quality-of-life-related predictors of arthritis status: evidence from the China health and retirement longitudinal study.

Explainable machine learning identifies key quality-of-life-related predictors of arthritis status: evidence from the China health and retirement longitudinal study.

Explainable machine learning identifies key quality-of-life-related predictors of arthritis status: evidence from the China health and retirement longitudinal study.

Background: Arthritis is a prevalent chronic disease substantially impacting patients' quality of life (QoL). While identifying key determinants associated with arthritis is critical for targeted interventions, traditional statistical methods often struggle with complex interactions, and existing machine learning (ML) approaches frequently lack the interpretability needed to guide clinical decisions. This study integrates a comprehensive, explainable machine learning (XAI) workflow to identify and interpret key QoL-related predictors of arthritis status in a large national cohort.

Methods: Data were obtained from 15,011 participants aged > 45 years in the 2020 China Health and Retirement Longitudinal Study (CHARLS). We initially selected 55 potential QoL-related predictors spanning demographic, functional, pain, psychosocial, and lifestyle domains. Feature engineering was performed to create aggregate scores, indicators, and binned variables. Missing data were handled using imputation combined with missing indicator variables. A LightGBM-based feature selection process identified 68 key predictors. Nine ML models (including Logistic Regression, RandomForest, GradientBoosting, LightGBM, CatBoost, XGBoost, DecisionTree, NaiveBayes, KNN) were developed using SMOTE-resampled training data, with hyperparameters optimized via Optuna targeting recall. Performance was evaluated on a held-out test set using Area Under the ROC Curve (AUC), Average Precision (AP), Recall, Specificity, Precison, and F1-score. SHapley Additive exPlanations (SHAP) analysis was applied to the best-performing model (GradientBoosting) for interpretation.

Results: Several models achieved strong predictive performance, with GradientBoosting yielding the highest AUC (0.767, 95% CI: 0.752-0.782) and high AP (0.678, 95% CI: 0.655-0.702). SHAP analysis identified multi-site pain burden (particularly knee/leg pain and pain location count), age, self-rated health, sleep quality, functional limitations (ADL counts/scores), and negative affect as the most influential predictors driving arthritis status prediction.

Conclusions: This study successfully applied an XAI pipeline to identify and rank key QoL-related factors predictive of arthritis status in a large Chinese cohort, achieving robust model performance. Pain burden, age, subjective health, sleep, functional status, and psychological well-being are critical determinants. These interpretable findings can inform risk stratification and guide targeted interventions focusing on these key areas to potentially improve arthritis management.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.30
自引率
2.80%
发文量
154
审稿时长
3-8 weeks
期刊介绍: Health and Quality of Life Outcomes is an open access, peer-reviewed, journal offering high quality articles, rapid publication and wide diffusion in the public domain. Health and Quality of Life Outcomes considers original manuscripts on the Health-Related Quality of Life (HRQOL) assessment for evaluation of medical and psychosocial interventions. It also considers approaches and studies on psychometric properties of HRQOL and patient reported outcome measures, including cultural validation of instruments if they provide information about the impact of interventions. The journal publishes study protocols and reviews summarising the present state of knowledge concerning a particular aspect of HRQOL and patient reported outcome measures. Reviews should generally follow systematic review methodology. Comments on articles and letters to the editor are welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信