Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis.

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association Pub Date : 2024-10-01 DOI:10.1093/jamia/ocae189

Lidan Liu, Lu Liu, Hatem A Wafa, Florence Tydeman, Wanqing Xie, Yanzhong Wang

{"title":"Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis.","authors":"Lidan Liu, Lu Liu, Hatem A Wafa, Florence Tydeman, Wanqing Xie, Yanzhong Wang","doi":"10.1093/jamia/ocae189","DOIUrl":null,"url":null,"abstract":"Objective: This study aims to conduct a systematic review and meta-analysis of the diagnostic accuracy of deep learning (DL) using speech samples in depression.Materials and methods: This review included studies reporting diagnostic results of DL algorithms in depression using speech data, published from inception to January 31, 2024, on PubMed, Medline, Embase, PsycINFO, Scopus, IEEE, and Web of Science databases. Pooled accuracy, sensitivity, and specificity were obtained by random-effect models. The diagnostic Precision Study Quality Assessment Tool (QUADAS-2) was used to assess the risk of bias.Results: A total of 25 studies met the inclusion criteria and 8 of them were used in the meta-analysis. The pooled estimates of accuracy, specificity, and sensitivity for depression detection models were 0.87 (95% CI, 0.81-0.93), 0.85 (95% CI, 0.78-0.91), and 0.82 (95% CI, 0.71-0.94), respectively. When stratified by model structure, the highest pooled diagnostic accuracy was 0.89 (95% CI, 0.81-0.97) in the handcrafted group.Discussion: To our knowledge, our study is the first meta-analysis on the diagnostic performance of DL for depression detection from speech samples. All studies included in the meta-analysis used convolutional neural network (CNN) models, posing problems in deciphering the performance of other DL algorithms. The handcrafted model performed better than the end-to-end model in speech depression detection.Conclusions: The application of DL in speech provided a useful tool for depression detection. CNN models with handcrafted acoustic features could help to improve the diagnostic performance.Protocol registration: The study protocol was registered on PROSPERO (CRD42023423603).","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"2394-2404"},"PeriodicalIF":4.7000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413444/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae189","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: This study aims to conduct a systematic review and meta-analysis of the diagnostic accuracy of deep learning (DL) using speech samples in depression.

Materials and methods: This review included studies reporting diagnostic results of DL algorithms in depression using speech data, published from inception to January 31, 2024, on PubMed, Medline, Embase, PsycINFO, Scopus, IEEE, and Web of Science databases. Pooled accuracy, sensitivity, and specificity were obtained by random-effect models. The diagnostic Precision Study Quality Assessment Tool (QUADAS-2) was used to assess the risk of bias.

Results: A total of 25 studies met the inclusion criteria and 8 of them were used in the meta-analysis. The pooled estimates of accuracy, specificity, and sensitivity for depression detection models were 0.87 (95% CI, 0.81-0.93), 0.85 (95% CI, 0.78-0.91), and 0.82 (95% CI, 0.71-0.94), respectively. When stratified by model structure, the highest pooled diagnostic accuracy was 0.89 (95% CI, 0.81-0.97) in the handcrafted group.

Discussion: To our knowledge, our study is the first meta-analysis on the diagnostic performance of DL for depression detection from speech samples. All studies included in the meta-analysis used convolutional neural network (CNN) models, posing problems in deciphering the performance of other DL algorithms. The handcrafted model performed better than the end-to-end model in speech depression detection.

Conclusions: The application of DL in speech provided a useful tool for depression detection. CNN models with handcrafted acoustic features could help to improve the diagnostic performance.

Protocol registration: The study protocol was registered on PROSPERO (CRD42023423603).

查看原文本刊更多论文

利用语音样本进行深度学习对抑郁症的诊断准确性：系统综述和荟萃分析。

研究目的本研究旨在对使用语音样本的深度学习（DL）对抑郁症的诊断准确性进行系统综述和荟萃分析：本综述纳入了PubMed、Medline、Embase、PsycINFO、Scopus、IEEE和Web of Science数据库中从开始到2024年1月31日发表的、报告使用语音数据的深度学习算法对抑郁症的诊断结果的研究。通过随机效应模型得出了汇总的准确性、敏感性和特异性。诊断精确性研究质量评估工具（QUADAS-2）用于评估偏倚风险：共有 25 项研究符合纳入标准，其中 8 项用于荟萃分析。抑郁检测模型的准确性、特异性和敏感性的汇总估计值分别为 0.87（95% CI，0.81-0.93）、0.85（95% CI，0.78-0.91）和 0.82（95% CI，0.71-0.94）。按模型结构分层后，手工组的汇总诊断准确率最高，为 0.89（95% CI，0.81-0.97）：据我们所知，我们的研究是首次对 DL 从语音样本中检测抑郁的诊断性能进行荟萃分析。所有纳入荟萃分析的研究都使用了卷积神经网络（CNN）模型，这给解读其他 DL 算法的性能带来了问题。在语音抑郁检测中，手工制作的模型比端到端模型表现更好：在语音中应用 DL 为抑郁检测提供了有用的工具。带有手工制作声学特征的 CNN 模型有助于提高诊断性能：研究方案已在 PROSPERO（CRD42023423603）上注册。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.