Explainable AI for Depression Detection and Severity Classification From Activity Data: Development and Evaluation Study of an Interpretable Framework.

IF 5.8 2区医学 Q1 PSYCHIATRY

Jmir Mental Health Pub Date : 2025-09-11 DOI:10.2196/72038

Iftikhar Ahmed, Anushree Brahmacharimayum, Raja Hashim Ali, Talha Ali Khan, Muhammad Ovais Ahmad

{"title":"Explainable AI for Depression Detection and Severity Classification From Activity Data: Development and Evaluation Study of an Interpretable Framework.","authors":"Iftikhar Ahmed, Anushree Brahmacharimayum, Raja Hashim Ali, Talha Ali Khan, Muhammad Ovais Ahmad","doi":"10.2196/72038","DOIUrl":null,"url":null,"abstract":"Background: Depression is one of the most prevalent mental health disorders globally, affecting approximately 280 million people and frequently going undiagnosed or misdiagnosed. The growing ubiquity of wearable devices enables continuous monitoring of activity levels, providing a new avenue for data-driven detection and severity assessment of depression. However, existing machine learning models often exhibit lower performance when distinguishing overlapping subtypes of depression and frequently lack explainability, an essential component for clinical acceptance.Objective: This study aimed to develop and evaluate an interpretable machine learning framework for detecting depression and classifying its severity using wearable-actigraphy data, while addressing common challenges such as imbalanced datasets and limited model transparency.Methods: We used the Depresjon dataset and applied Adaptive Synthetic Sampling (ADASYN) to mitigate class imbalance. We extracted multiple statistical features (eg, power spectral density mean and autocorrelation) and demographic attributes (eg, age) from the raw activity data. Five machine learning algorithms (logistic regression, support vector machines, random forest, XGBoost, and neural networks) were assessed via accuracy, precision, recall, F1-score, specificity, and Matthew correlation constant. We further used Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) to elucidate prediction drivers.Results: XGBoost achieved the highest overall accuracy of 84.94% for binary classification and 85.91% for multiclass severity. SHAP and LIME revealed power spectral density mean, age, and autocorrelation as top predictors, highlighting circadian disruptions' role in depression.Conclusions: Our interpretable framework reliably identifies depressed versus nondepressed individuals and differentiates mild from moderate depression. The inclusion of SHAP and LIME provides transparent, clinically meaningful insights, emphasizing the potential of explainable artificial intelligence to enhance early detection and intervention strategies in mental health care.","PeriodicalId":48616,"journal":{"name":"Jmir Mental Health","volume":"12 ","pages":"e72038"},"PeriodicalIF":5.8000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12425426/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jmir Mental Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/72038","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Depression is one of the most prevalent mental health disorders globally, affecting approximately 280 million people and frequently going undiagnosed or misdiagnosed. The growing ubiquity of wearable devices enables continuous monitoring of activity levels, providing a new avenue for data-driven detection and severity assessment of depression. However, existing machine learning models often exhibit lower performance when distinguishing overlapping subtypes of depression and frequently lack explainability, an essential component for clinical acceptance.

Objective: This study aimed to develop and evaluate an interpretable machine learning framework for detecting depression and classifying its severity using wearable-actigraphy data, while addressing common challenges such as imbalanced datasets and limited model transparency.

Methods: We used the Depresjon dataset and applied Adaptive Synthetic Sampling (ADASYN) to mitigate class imbalance. We extracted multiple statistical features (eg, power spectral density mean and autocorrelation) and demographic attributes (eg, age) from the raw activity data. Five machine learning algorithms (logistic regression, support vector machines, random forest, XGBoost, and neural networks) were assessed via accuracy, precision, recall, F1-score, specificity, and Matthew correlation constant. We further used Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) to elucidate prediction drivers.

Results: XGBoost achieved the highest overall accuracy of 84.94% for binary classification and 85.91% for multiclass severity. SHAP and LIME revealed power spectral density mean, age, and autocorrelation as top predictors, highlighting circadian disruptions' role in depression.

Conclusions: Our interpretable framework reliably identifies depressed versus nondepressed individuals and differentiates mild from moderate depression. The inclusion of SHAP and LIME provides transparent, clinically meaningful insights, emphasizing the potential of explainable artificial intelligence to enhance early detection and intervention strategies in mental health care.

Abstract Image

查看原文本刊更多论文

基于活动数据的抑郁症检测和严重程度分类的可解释AI：一个可解释框架的开发和评估研究。

背景：抑郁症是全球最普遍的精神健康障碍之一，影响约2.8亿人，经常未被诊断或误诊。日益普及的可穿戴设备可以持续监测活动水平，为数据驱动的抑郁症检测和严重程度评估提供了新的途径。然而，现有的机器学习模型在区分重叠的抑郁症亚型时往往表现出较低的性能，并且经常缺乏可解释性，这是临床接受的重要组成部分。目的：本研究旨在开发和评估一个可解释的机器学习框架，用于使用可穿戴式活动记录仪数据检测抑郁症并对其严重程度进行分类，同时解决数据集不平衡和模型透明度有限等常见挑战。方法：利用depression数据集，采用自适应合成采样（ADASYN）来缓解类失衡。我们从原始活动数据中提取了多个统计特征（如功率谱密度平均值和自相关性）和人口统计属性（如年龄）。五种机器学习算法（逻辑回归、支持向量机、随机森林、XGBoost和神经网络）通过准确性、精密度、召回率、f1评分、特异性和马修相关常数进行评估。我们进一步使用Shapley加性解释（SHAP）和局部可解释模型不可知论解释（LIME）来阐明预测驱动因素。结果：XGBoost在二元分类和多类严重程度上的总体准确率最高，分别为84.94%和85.91%。SHAP和LIME显示，功率谱密度均值、年龄和自相关性是最重要的预测因子，突出了昼夜节律中断在抑郁症中的作用。结论：我们的可解释框架可靠地识别抑郁与非抑郁个体，并区分轻度和中度抑郁。SHAP和LIME的纳入提供了透明的、有临床意义的见解，强调了可解释的人工智能在加强精神卫生保健早期发现和干预策略方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Jmir Mental Health Medicine-Psychiatry and Mental Health

CiteScore

10.80

自引率

3.80%

发文量

104

审稿时长

16 weeks

期刊介绍： JMIR Mental Health (JMH, ISSN 2368-7959) is a PubMed-indexed, peer-reviewed sister journal of JMIR, the leading eHealth journal (Impact Factor 2016: 5.175). JMIR Mental Health focusses on digital health and Internet interventions, technologies and electronic innovations (software and hardware) for mental health, addictions, online counselling and behaviour change. This includes formative evaluation and system descriptions, theoretical papers, review papers, viewpoint/vision papers, and rigorous evaluations.