Prediction of depressive disorder using machine learning approaches: findings from the NHANES.

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS
Thien Vu, Research Dawadi, Masaki Yamamoto, Jie Ting Tay, Naoki Watanabe, Yuki Kuriya, Ai Oya, Phap Ngoc Hoang Tran, Michihiro Araki
{"title":"Prediction of depressive disorder using machine learning approaches: findings from the NHANES.","authors":"Thien Vu, Research Dawadi, Masaki Yamamoto, Jie Ting Tay, Naoki Watanabe, Yuki Kuriya, Ai Oya, Phap Ngoc Hoang Tran, Michihiro Araki","doi":"10.1186/s12911-025-02903-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Depressive disorder, particularly major depressive disorder (MDD), significantly impact individuals and society. Traditional analysis methods often suffer from subjectivity and may not capture complex, non-linear relationships between risk factors. Machine learning (ML) offers a data-driven approach to predict and diagnose depression more accurately by analyzing large and complex datasets.</p><p><strong>Methods: </strong>This study utilized data from the National Health and Nutrition Examination Survey (NHANES) 2013-2014 to predict depression using six supervised ML models: Logistic Regression, Random Forest, Naive Bayes, Support Vector Machine (SVM), Extreme Gradient Boost (XGBoost), and Light Gradient Boosting Machine (LightGBM). Depression was assessed using the Patient Health Questionnaire (PHQ-9), with a score of 10 or higher indicating moderate to severe depression. The dataset was split into training and testing sets (80% and 20%, respectively), and model performance was evaluated using accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP (SHapley Additive exPlanations) values were used to identify the critical risk factors and interpret the contributions of each feature to the prediction.</p><p><strong>Results: </strong>XGBoost was identified as the best-performing model, achieving the highest accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP analysis highlighted the most significant predictors of depression: the ratio family income to poverty (PIR), sex, hypertension, serum cotinine and hydroxycotine, BMI, education level, glucose levels, age, marital status, and renal function (eGFR).</p><p><strong>Conclusion: </strong>We developed ML models to predict depression and utilized SHAP for interpretation. This approach identifies key factors associated with depression, encompassing socioeconomic, demographic, and health-related aspects.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"83"},"PeriodicalIF":3.3000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834192/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02903-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Depressive disorder, particularly major depressive disorder (MDD), significantly impact individuals and society. Traditional analysis methods often suffer from subjectivity and may not capture complex, non-linear relationships between risk factors. Machine learning (ML) offers a data-driven approach to predict and diagnose depression more accurately by analyzing large and complex datasets.

Methods: This study utilized data from the National Health and Nutrition Examination Survey (NHANES) 2013-2014 to predict depression using six supervised ML models: Logistic Regression, Random Forest, Naive Bayes, Support Vector Machine (SVM), Extreme Gradient Boost (XGBoost), and Light Gradient Boosting Machine (LightGBM). Depression was assessed using the Patient Health Questionnaire (PHQ-9), with a score of 10 or higher indicating moderate to severe depression. The dataset was split into training and testing sets (80% and 20%, respectively), and model performance was evaluated using accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP (SHapley Additive exPlanations) values were used to identify the critical risk factors and interpret the contributions of each feature to the prediction.

Results: XGBoost was identified as the best-performing model, achieving the highest accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP analysis highlighted the most significant predictors of depression: the ratio family income to poverty (PIR), sex, hypertension, serum cotinine and hydroxycotine, BMI, education level, glucose levels, age, marital status, and renal function (eGFR).

Conclusion: We developed ML models to predict depression and utilized SHAP for interpretation. This approach identifies key factors associated with depression, encompassing socioeconomic, demographic, and health-related aspects.

使用机器学习方法预测抑郁症:来自NHANES的发现。
背景:抑郁症,尤其是重度抑郁症(MDD)对个人和社会的影响显著。传统的分析方法往往存在主观性,可能无法捕捉风险因素之间复杂的非线性关系。机器学习(ML)提供了一种数据驱动的方法,通过分析大型和复杂的数据集,更准确地预测和诊断抑郁症。方法:利用2013-2014年美国国家健康与营养调查(NHANES)的数据,采用Logistic回归、随机森林、朴素贝叶斯、支持向量机(SVM)、极端梯度增强(XGBoost)和光梯度增强机(LightGBM) 6种监督ML模型预测抑郁症。使用患者健康问卷(PHQ-9)评估抑郁症,10分或更高表示中度至重度抑郁症。将数据集分为训练集和测试集(分别为80%和20%),并使用准确性、灵敏度、特异性、精度、AUC和F1评分来评估模型性能。SHAP (SHapley Additive explanation)值用于识别关键风险因素,并解释每个特征对预测的贡献。结果:XGBoost模型的准确性、灵敏度、特异性、精密度、AUC和F1评分最高,为最佳模型。SHAP分析强调了抑郁症最重要的预测因素:家庭收入与贫困之比(PIR)、性别、高血压、血清可替宁和羟可替宁、BMI、教育水平、血糖水平、年龄、婚姻状况和肾功能(eGFR)。结论:我们建立了ML模型来预测抑郁症,并利用SHAP进行解释。这种方法确定了与抑郁症相关的关键因素,包括社会经济、人口统计学和健康相关方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信