{"title":"一个以数据为中心和可解释的脑电图框架,用于使用基于shap的见解进行抑郁症严重程度分级。","authors":"Anruo Shen, Jingnan Sun, Xiaogang Chen, Xiaorong Gao","doi":"10.1186/s12984-025-01645-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Major Depressive Disorder is a leading cause of disability worldwide. An accurate assessment of depression severity is critical for diagnosis, treatment planning, and monitoring, yet current clinical tools are largely subjective, relying on self-report and clinician judgment via traditional assessment scales. EEG has emerged as a promising, non-invasive modality for capturing neural correlates of depression. However, most EEG-based machine learning diagnostic studies focus on boosting classification accuracy through complex algorithms and small, homogenous datasets. These black-box approaches often yield results that are difficult to interpret and poorly generalizable, making clinical translation impractical. Therefore there remains a critical need for models that are not only accurate but also transparent, robust, and grounded in the physiological properties of the data itself.</p><p><strong>Methods: </strong>We proposed a data-centric, interpretable framework for EEG-based depression severity grading. A hybrid feature selection method was used, combining p-value and SHapley Additive exPlanations (SHAP) methods to select features that are both independently significant and jointly informative. The system was trained and evaluated on a large-scale, multi-site resting-state EEG dataset, using random forest for both classification and regression tasks. The SHAP method, an explainable artificial intelligence technique, is also used post-hoc to infer the key electrophysiological features and key brain regions associated with MDD mechanism to further increase interpretability.</p><p><strong>Results: </strong>The proposed system achieved 74.5% (95% CI [70.97%, 78.80%], p < 0.001) ten-fold classification accuracy and a correlation coefficient of 0.56 (95% CI [0.407, 0.683], p < 0.001) for severity estimation. SHAP analysis identified consistent, clinically meaningful EEG features, particularly in the left parietal-occipital lobe. Through in-depth SHAP value analysis, we identified critical disease-related brain areas in the left occipital and parietal lobes, along with key features including relative beta power in the left parietal lobe, time-domain features at the parietal midline, 1/f intercept, left occipital relative beta power, and global brain alpha energy.</p><p><strong>Conclusion: </strong>This study proposes a data-centric, interpretable depression grading system built on large-scale, multi-center EEG data, using simple models and hybrid feature selection to emphasize explainability, generalizability and data fidelity. By shifting the focus from algorithmic complexity to data transparency and feature-level insight, the model offers a practical and trustworthy path toward real-world mental health assessment.</p>","PeriodicalId":16384,"journal":{"name":"Journal of NeuroEngineering and Rehabilitation","volume":"22 1","pages":"116"},"PeriodicalIF":5.2000,"publicationDate":"2025-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12103758/pdf/","citationCount":"0","resultStr":"{\"title\":\"A data-centric and interpretable EEG framework for depression severity grading using SHAP-based insights.\",\"authors\":\"Anruo Shen, Jingnan Sun, Xiaogang Chen, Xiaorong Gao\",\"doi\":\"10.1186/s12984-025-01645-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Major Depressive Disorder is a leading cause of disability worldwide. An accurate assessment of depression severity is critical for diagnosis, treatment planning, and monitoring, yet current clinical tools are largely subjective, relying on self-report and clinician judgment via traditional assessment scales. EEG has emerged as a promising, non-invasive modality for capturing neural correlates of depression. However, most EEG-based machine learning diagnostic studies focus on boosting classification accuracy through complex algorithms and small, homogenous datasets. These black-box approaches often yield results that are difficult to interpret and poorly generalizable, making clinical translation impractical. Therefore there remains a critical need for models that are not only accurate but also transparent, robust, and grounded in the physiological properties of the data itself.</p><p><strong>Methods: </strong>We proposed a data-centric, interpretable framework for EEG-based depression severity grading. A hybrid feature selection method was used, combining p-value and SHapley Additive exPlanations (SHAP) methods to select features that are both independently significant and jointly informative. The system was trained and evaluated on a large-scale, multi-site resting-state EEG dataset, using random forest for both classification and regression tasks. The SHAP method, an explainable artificial intelligence technique, is also used post-hoc to infer the key electrophysiological features and key brain regions associated with MDD mechanism to further increase interpretability.</p><p><strong>Results: </strong>The proposed system achieved 74.5% (95% CI [70.97%, 78.80%], p < 0.001) ten-fold classification accuracy and a correlation coefficient of 0.56 (95% CI [0.407, 0.683], p < 0.001) for severity estimation. SHAP analysis identified consistent, clinically meaningful EEG features, particularly in the left parietal-occipital lobe. Through in-depth SHAP value analysis, we identified critical disease-related brain areas in the left occipital and parietal lobes, along with key features including relative beta power in the left parietal lobe, time-domain features at the parietal midline, 1/f intercept, left occipital relative beta power, and global brain alpha energy.</p><p><strong>Conclusion: </strong>This study proposes a data-centric, interpretable depression grading system built on large-scale, multi-center EEG data, using simple models and hybrid feature selection to emphasize explainability, generalizability and data fidelity. By shifting the focus from algorithmic complexity to data transparency and feature-level insight, the model offers a practical and trustworthy path toward real-world mental health assessment.</p>\",\"PeriodicalId\":16384,\"journal\":{\"name\":\"Journal of NeuroEngineering and Rehabilitation\",\"volume\":\"22 1\",\"pages\":\"116\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12103758/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of NeuroEngineering and Rehabilitation\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1186/s12984-025-01645-5\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of NeuroEngineering and Rehabilitation","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s12984-025-01645-5","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
A data-centric and interpretable EEG framework for depression severity grading using SHAP-based insights.
Background: Major Depressive Disorder is a leading cause of disability worldwide. An accurate assessment of depression severity is critical for diagnosis, treatment planning, and monitoring, yet current clinical tools are largely subjective, relying on self-report and clinician judgment via traditional assessment scales. EEG has emerged as a promising, non-invasive modality for capturing neural correlates of depression. However, most EEG-based machine learning diagnostic studies focus on boosting classification accuracy through complex algorithms and small, homogenous datasets. These black-box approaches often yield results that are difficult to interpret and poorly generalizable, making clinical translation impractical. Therefore there remains a critical need for models that are not only accurate but also transparent, robust, and grounded in the physiological properties of the data itself.
Methods: We proposed a data-centric, interpretable framework for EEG-based depression severity grading. A hybrid feature selection method was used, combining p-value and SHapley Additive exPlanations (SHAP) methods to select features that are both independently significant and jointly informative. The system was trained and evaluated on a large-scale, multi-site resting-state EEG dataset, using random forest for both classification and regression tasks. The SHAP method, an explainable artificial intelligence technique, is also used post-hoc to infer the key electrophysiological features and key brain regions associated with MDD mechanism to further increase interpretability.
Results: The proposed system achieved 74.5% (95% CI [70.97%, 78.80%], p < 0.001) ten-fold classification accuracy and a correlation coefficient of 0.56 (95% CI [0.407, 0.683], p < 0.001) for severity estimation. SHAP analysis identified consistent, clinically meaningful EEG features, particularly in the left parietal-occipital lobe. Through in-depth SHAP value analysis, we identified critical disease-related brain areas in the left occipital and parietal lobes, along with key features including relative beta power in the left parietal lobe, time-domain features at the parietal midline, 1/f intercept, left occipital relative beta power, and global brain alpha energy.
Conclusion: This study proposes a data-centric, interpretable depression grading system built on large-scale, multi-center EEG data, using simple models and hybrid feature selection to emphasize explainability, generalizability and data fidelity. By shifting the focus from algorithmic complexity to data transparency and feature-level insight, the model offers a practical and trustworthy path toward real-world mental health assessment.
期刊介绍:
Journal of NeuroEngineering and Rehabilitation considers manuscripts on all aspects of research that result from cross-fertilization of the fields of neuroscience, biomedical engineering, and physical medicine & rehabilitation.