A data-centric and interpretable EEG framework for depression severity grading using SHAP-based insights.

IF 5.2 2区医学 Q1 ENGINEERING, BIOMEDICAL

Journal of NeuroEngineering and Rehabilitation Pub Date : 2025-05-25 DOI:10.1186/s12984-025-01645-5

Anruo Shen, Jingnan Sun, Xiaogang Chen, Xiaorong Gao

{"title":"A data-centric and interpretable EEG framework for depression severity grading using SHAP-based insights.","authors":"Anruo Shen, Jingnan Sun, Xiaogang Chen, Xiaorong Gao","doi":"10.1186/s12984-025-01645-5","DOIUrl":null,"url":null,"abstract":"Background: Major Depressive Disorder is a leading cause of disability worldwide. An accurate assessment of depression severity is critical for diagnosis, treatment planning, and monitoring, yet current clinical tools are largely subjective, relying on self-report and clinician judgment via traditional assessment scales. EEG has emerged as a promising, non-invasive modality for capturing neural correlates of depression. However, most EEG-based machine learning diagnostic studies focus on boosting classification accuracy through complex algorithms and small, homogenous datasets. These black-box approaches often yield results that are difficult to interpret and poorly generalizable, making clinical translation impractical. Therefore there remains a critical need for models that are not only accurate but also transparent, robust, and grounded in the physiological properties of the data itself.Methods: We proposed a data-centric, interpretable framework for EEG-based depression severity grading. A hybrid feature selection method was used, combining p-value and SHapley Additive exPlanations (SHAP) methods to select features that are both independently significant and jointly informative. The system was trained and evaluated on a large-scale, multi-site resting-state EEG dataset, using random forest for both classification and regression tasks. The SHAP method, an explainable artificial intelligence technique, is also used post-hoc to infer the key electrophysiological features and key brain regions associated with MDD mechanism to further increase interpretability.Results: The proposed system achieved 74.5% (95% CI [70.97%, 78.80%], p < 0.001) ten-fold classification accuracy and a correlation coefficient of 0.56 (95% CI [0.407, 0.683], p < 0.001) for severity estimation. SHAP analysis identified consistent, clinically meaningful EEG features, particularly in the left parietal-occipital lobe. Through in-depth SHAP value analysis, we identified critical disease-related brain areas in the left occipital and parietal lobes, along with key features including relative beta power in the left parietal lobe, time-domain features at the parietal midline, 1/f intercept, left occipital relative beta power, and global brain alpha energy.Conclusion: This study proposes a data-centric, interpretable depression grading system built on large-scale, multi-center EEG data, using simple models and hybrid feature selection to emphasize explainability, generalizability and data fidelity. By shifting the focus from algorithmic complexity to data transparency and feature-level insight, the model offers a practical and trustworthy path toward real-world mental health assessment.","PeriodicalId":16384,"journal":{"name":"Journal of NeuroEngineering and Rehabilitation","volume":"22 1","pages":"116"},"PeriodicalIF":5.2000,"publicationDate":"2025-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12103758/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of NeuroEngineering and Rehabilitation","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s12984-025-01645-5","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Major Depressive Disorder is a leading cause of disability worldwide. An accurate assessment of depression severity is critical for diagnosis, treatment planning, and monitoring, yet current clinical tools are largely subjective, relying on self-report and clinician judgment via traditional assessment scales. EEG has emerged as a promising, non-invasive modality for capturing neural correlates of depression. However, most EEG-based machine learning diagnostic studies focus on boosting classification accuracy through complex algorithms and small, homogenous datasets. These black-box approaches often yield results that are difficult to interpret and poorly generalizable, making clinical translation impractical. Therefore there remains a critical need for models that are not only accurate but also transparent, robust, and grounded in the physiological properties of the data itself.

Methods: We proposed a data-centric, interpretable framework for EEG-based depression severity grading. A hybrid feature selection method was used, combining p-value and SHapley Additive exPlanations (SHAP) methods to select features that are both independently significant and jointly informative. The system was trained and evaluated on a large-scale, multi-site resting-state EEG dataset, using random forest for both classification and regression tasks. The SHAP method, an explainable artificial intelligence technique, is also used post-hoc to infer the key electrophysiological features and key brain regions associated with MDD mechanism to further increase interpretability.

Results: The proposed system achieved 74.5% (95% CI [70.97%, 78.80%], p < 0.001) ten-fold classification accuracy and a correlation coefficient of 0.56 (95% CI [0.407, 0.683], p < 0.001) for severity estimation. SHAP analysis identified consistent, clinically meaningful EEG features, particularly in the left parietal-occipital lobe. Through in-depth SHAP value analysis, we identified critical disease-related brain areas in the left occipital and parietal lobes, along with key features including relative beta power in the left parietal lobe, time-domain features at the parietal midline, 1/f intercept, left occipital relative beta power, and global brain alpha energy.

Conclusion: This study proposes a data-centric, interpretable depression grading system built on large-scale, multi-center EEG data, using simple models and hybrid feature selection to emphasize explainability, generalizability and data fidelity. By shifting the focus from algorithmic complexity to data transparency and feature-level insight, the model offers a practical and trustworthy path toward real-world mental health assessment.

Abstract Image

查看原文本刊更多论文

一个以数据为中心和可解释的脑电图框架，用于使用基于shap的见解进行抑郁症严重程度分级。

背景：重度抑郁症是世界范围内致残的主要原因。对抑郁症严重程度的准确评估对于诊断、治疗计划和监测至关重要，但目前的临床工具在很大程度上是主观的，依赖于传统评估量表的自我报告和临床医生的判断。脑电图已经成为一种有前途的、非侵入性的方式来捕捉抑郁症的神经相关。然而，大多数基于脑电图的机器学习诊断研究都侧重于通过复杂的算法和小型同质数据集来提高分类准确性。这些黑盒方法通常产生的结果难以解释和不好概括，使临床翻译不切实际。因此，我们迫切需要的模型不仅要准确，还要透明、稳健，并以数据本身的生理特性为基础。方法：我们提出了一个以数据为中心的、可解释的基于脑电图的抑郁症严重程度分级框架。采用混合特征选择方法，结合p值和SHapley加性解释（SHAP）方法来选择既独立显著又联合信息丰富的特征。该系统在一个大规模、多站点静息状态脑电图数据集上进行训练和评估，使用随机森林进行分类和回归任务。SHAP方法是一种可解释的人工智能技术，也被用于事后推断与MDD机制相关的关键电生理特征和关键脑区域，以进一步提高可解释性。结论：本研究提出了一种以数据为中心、可解释的抑郁症评分系统，该系统建立在大规模、多中心的脑电数据基础上，采用简单的模型和混合特征选择，强调可解释性、通用性和数据保真度。通过将重点从算法复杂性转移到数据透明度和特征级洞察力，该模型为现实世界的心理健康评估提供了一条实用且值得信赖的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of NeuroEngineering and Rehabilitation 工程技术-工程：生物医学

CiteScore

9.60

自引率

3.90%

发文量

122

审稿时长

24 months

期刊介绍： Journal of NeuroEngineering and Rehabilitation considers manuscripts on all aspects of research that result from cross-fertilization of the fields of neuroscience, biomedical engineering, and physical medicine & rehabilitation.