Performance of Automatic Speech Analysis in Detecting Depression: Systematic Review and Meta-Analysis.

IF 5.8 2区 医学 Q1 PSYCHIATRY
Jmir Mental Health Pub Date : 2025-10-22 DOI:10.2196/67802
Patricia Laura Maran, María Dolores Braquehais, Alexandra Vlaic, María Teresa Alonzo-Castillo, Júlia Vendrell-Serres, Josep Antoni Ramos-Quiroga, Amanda Rodríguez-Urrutia
{"title":"Performance of Automatic Speech Analysis in Detecting Depression: Systematic Review and Meta-Analysis.","authors":"Patricia Laura Maran, María Dolores Braquehais, Alexandra Vlaic, María Teresa Alonzo-Castillo, Júlia Vendrell-Serres, Josep Antoni Ramos-Quiroga, Amanda Rodríguez-Urrutia","doi":"10.2196/67802","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Despite the high prevalence and significant burden of depression, underdiagnosis remains a persistent challenge. Automatic speech analysis (ASA) has emerged as a promising method for depression assessment. However, a comprehensive quantitative synthesis evaluating its diagnostic accuracy is still lacking.</p><p><strong>Objective: </strong>This systematic review and meta-analysis aimed to assess the diagnostic performance of ASA in detecting depression, considering both machine learning and deep learning approaches.</p><p><strong>Methods: </strong>We conducted a systematic search across 8 databases, including MEDLINE, PsycInfo, Embase, CINAHL, IEEE Xplore, ACM Digital Library, Scopus, and Google Scholar from January 2013 to April 1, 2025. We included studies published in English that evaluated the accuracy of ASA for detecting depression, and reported performance metrics such as accuracy, sensitivity, specificity, precision, or confusion matrices. Study quality was assessed using a modified version of the Quality Assessment of Studies of Diagnostic Accuracy-Revised. A 3-level meta-analysis was performed to estimate the pooled highest and lowest accuracy, sensitivity, specificity, and precision. Meta-regressions and subgroup analyses were performed to explore heterogeneity across various factors, including type of publication, artificial intelligence algorithms, speech features, speech-eliciting tasks, ground truth assessment, validation approach, dataset, dataset language, participants' mean age, and sample size.</p><p><strong>Results: </strong>Of the 1345 records identified, 105 studies met the inclusion criteria. The pooled mean of the highest accuracy, sensitivity, specificity, and precision were 0.81 (95% CI 0.79 to 0.83), 0.84 (95% CI 0.81 to 0.86), 0.83 (95% CI 0.79 to 0.86), and 0.81 (95% CI 0.77 to 0.84), respectively, whereas the pooled mean of the lowest accuracy, sensitivity, specificity, and precision were 0.66 (95% CI 0.63 to 0.69), 0.63 (95% CI 0.58 to 0.68), 0.60 (95% CI 0.55 to 0.66), and 0.64 (95% CI 0.58 to 0.70), respectively.</p><p><strong>Conclusions: </strong>ASA shows promise as a method for detecting depression, though its readiness for clinical application as a standalone tool remains limited. At present, it should be regarded as a complementary method, with potential applications across diverse contexts. Further high-quality, peer-reviewed studies are needed to support the development of robust, generalizable models and to advance this emerging field.</p><p><strong>Trial registration: </strong>PROSPERO CRD42023444431; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023444431.</p>","PeriodicalId":48616,"journal":{"name":"Jmir Mental Health","volume":"12 ","pages":"e67802"},"PeriodicalIF":5.8000,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jmir Mental Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/67802","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Despite the high prevalence and significant burden of depression, underdiagnosis remains a persistent challenge. Automatic speech analysis (ASA) has emerged as a promising method for depression assessment. However, a comprehensive quantitative synthesis evaluating its diagnostic accuracy is still lacking.

Objective: This systematic review and meta-analysis aimed to assess the diagnostic performance of ASA in detecting depression, considering both machine learning and deep learning approaches.

Methods: We conducted a systematic search across 8 databases, including MEDLINE, PsycInfo, Embase, CINAHL, IEEE Xplore, ACM Digital Library, Scopus, and Google Scholar from January 2013 to April 1, 2025. We included studies published in English that evaluated the accuracy of ASA for detecting depression, and reported performance metrics such as accuracy, sensitivity, specificity, precision, or confusion matrices. Study quality was assessed using a modified version of the Quality Assessment of Studies of Diagnostic Accuracy-Revised. A 3-level meta-analysis was performed to estimate the pooled highest and lowest accuracy, sensitivity, specificity, and precision. Meta-regressions and subgroup analyses were performed to explore heterogeneity across various factors, including type of publication, artificial intelligence algorithms, speech features, speech-eliciting tasks, ground truth assessment, validation approach, dataset, dataset language, participants' mean age, and sample size.

Results: Of the 1345 records identified, 105 studies met the inclusion criteria. The pooled mean of the highest accuracy, sensitivity, specificity, and precision were 0.81 (95% CI 0.79 to 0.83), 0.84 (95% CI 0.81 to 0.86), 0.83 (95% CI 0.79 to 0.86), and 0.81 (95% CI 0.77 to 0.84), respectively, whereas the pooled mean of the lowest accuracy, sensitivity, specificity, and precision were 0.66 (95% CI 0.63 to 0.69), 0.63 (95% CI 0.58 to 0.68), 0.60 (95% CI 0.55 to 0.66), and 0.64 (95% CI 0.58 to 0.70), respectively.

Conclusions: ASA shows promise as a method for detecting depression, though its readiness for clinical application as a standalone tool remains limited. At present, it should be regarded as a complementary method, with potential applications across diverse contexts. Further high-quality, peer-reviewed studies are needed to support the development of robust, generalizable models and to advance this emerging field.

Trial registration: PROSPERO CRD42023444431; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023444431.

语音自动分析在抑郁症检测中的表现:系统回顾与元分析。
背景:尽管抑郁症的高患病率和沉重的负担,诊断不足仍然是一个持续的挑战。自动语音分析(ASA)已成为一种很有前途的抑郁症评估方法。然而,目前还缺乏对其诊断准确性的全面定量综合评价。目的:本系统综述和荟萃分析旨在评估ASA在检测抑郁症方面的诊断性能,同时考虑机器学习和深度学习方法。方法:对2013年1月至2025年4月1日的MEDLINE、PsycInfo、Embase、CINAHL、IEEE explore、ACM Digital Library、Scopus、谷歌Scholar等8个数据库进行系统检索。我们纳入了用英文发表的研究,这些研究评估了ASA检测抑郁症的准确性,并报告了准确性、敏感性、特异性、精确性或混淆矩阵等性能指标。研究质量采用修订版的诊断准确性研究质量评估-修订版进行评估。进行3个水平的荟萃分析来估计最高和最低的准确度、敏感性、特异性和精密度。通过meta回归和亚组分析来探索各种因素之间的异质性,包括出版物类型、人工智能算法、语音特征、语音引出任务、基础真值评估、验证方法、数据集、数据集语言、参与者的平均年龄和样本量。结果:在纳入的1345篇文献中,105篇研究符合纳入标准。最高准确度、灵敏度、特异性和精密度的合并平均值分别为0.81 (95% CI 0.79 ~ 0.83)、0.84 (95% CI 0.81 ~ 0.86)、0.83 (95% CI 0.79 ~ 0.86)和0.81 (95% CI 0.77 ~ 0.84),而最低准确度、灵敏度、特异性和精密度的合并平均值分别为0.66 (95% CI 0.63 ~ 0.69)、0.63 (95% CI 0.58 ~ 0.68)、0.60 (95% CI 0.55 ~ 0.66)和0.64 (95% CI 0.58 ~ 0.70)。结论:ASA显示了作为一种检测抑郁症的方法的希望,尽管它作为一种独立工具的临床应用的准备程度仍然有限。目前,它应该被视为一种补充方法,在不同的背景下具有潜在的应用前景。需要进一步开展高质量的同行评议研究,以支持建立健全的、可推广的模型,并推动这一新兴领域的发展。试验注册:PROSPERO CRD42023444431;https://www.crd.york.ac.uk/PROSPERO/view/CRD42023444431。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Jmir Mental Health
Jmir Mental Health Medicine-Psychiatry and Mental Health
CiteScore
10.80
自引率
3.80%
发文量
104
审稿时长
16 weeks
期刊介绍: JMIR Mental Health (JMH, ISSN 2368-7959) is a PubMed-indexed, peer-reviewed sister journal of JMIR, the leading eHealth journal (Impact Factor 2016: 5.175). JMIR Mental Health focusses on digital health and Internet interventions, technologies and electronic innovations (software and hardware) for mental health, addictions, online counselling and behaviour change. This includes formative evaluation and system descriptions, theoretical papers, review papers, viewpoint/vision papers, and rigorous evaluations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信