Detecting schizophrenia, bipolar disorder, psychosis vulnerability and major depressive disorder from 5 minutes of online-collected speech.

IF 5.8 1区医学 Q1 PSYCHIATRY

Translational Psychiatry Pub Date : 2025-07-12 DOI:10.1038/s41398-025-03433-0

Julianna Olah, Win Lee Edwin Wong, Atta-Ul Raheem Rana Chaudhry, Omar Mena, Sunny X Tang

{"title":"Detecting schizophrenia, bipolar disorder, psychosis vulnerability and major depressive disorder from 5 minutes of online-collected speech.","authors":"Julianna Olah, Win Lee Edwin Wong, Atta-Ul Raheem Rana Chaudhry, Omar Mena, Sunny X Tang","doi":"10.1038/s41398-025-03433-0","DOIUrl":null,"url":null,"abstract":"<p><p>Psychosis poses substantial social and healthcare burdens. The analysis of speech is a promising approach for the diagnosis and monitoring of psychosis, capturing symptoms like thought disorder and flattened affect. Recent advancements in Natural Language Processing (NLP) methodologies enable the automated extraction of informative speech features, which has been leveraged for early psychosis detection and assessment of symptomology. However, critical gaps persist, including the absence of standardized sample collection protocols, small sample sizes, and a lack of multi-illness classification, limiting clinical applicability. Our study aimed to (1) identify an optimal assessment approach for the online and remote collection of speech, in the context of assessing the psychosis spectrum and evaluate whether a fully automated, speech-based machine learning (ML) pipeline can discriminate among different conditions on the schizophrenia-bipolar spectrum (SSD-BD-SPE), help-seeking comparison subjects (MDD), and healthy controls (HC) at varying layers of analysis and diagnostic complexity. We adopted online data collection methods to collect 20 min of speech and demographic information from individuals. Participants were categorized as \"healthy\" help-seekers (HC), having a schizophrenia-spectrum disorder (SSD), bipolar disorder (BD), major depressive disorder (MDD), or being on the psychosis spectrum with sub-clinical psychotic experiences (SPE). SPE status was determined based on self-reported clinical diagnosis and responses to the PHQ-8 and PQ-16 screening questionnaires, while other diagnoses were determined based on self-report from participants. Linguistic and paralinguistic features were extracted and ensemble learning algorithms (e.g., XGBoost) were used to train models. A 70-30% train-test split and 30-fold cross-validation was used to validate the model performance. The final analysis sample included 1140 individuals and 22,650 min of speech. Using 5 min of speech, our model could discriminate between HC and those with a serious mental illness (SSD or BD) with 86% accuracy (AUC = 0.91, Recall = 0.7, Precision = 0.98). Furthermore, our model could discern among HC, SPE, BD and SSD groups with 86% accuracy (F1 macro = 0.855, Recall Macro = 0.86, Precision Macro = 0.86). Finally, in a 5-class discrimination task including individuals with MDD, our model had 76% accuracy (F1 macro = 0.757, Recall Macro = 0.758, Precision Macro = 0.766). Our ML pipeline demonstrated disorder-specific learning, achieving excellent or good accuracy across several classification tasks. We demonstrated that the screening of mental disorders is possible via a fully automated, remote speech assessment pipeline. We tested our model on relatively high number conditions (5 classes) in the literature and in a stratified sample of psychosis spectrum, including HC, SPE, SSD and BD (4 classes). We tested our model on a large sample (N = 1150) and demonstrated best-in-class accuracy with remotely collected speech data in the psychosis spectrum, however, further clinical validation is needed to test the reliability of model performance.</p>","PeriodicalId":23278,"journal":{"name":"Translational Psychiatry","volume":"15 1","pages":"241"},"PeriodicalIF":5.8000,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12255794/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational Psychiatry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41398-025-03433-0","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}

引用次数: 0

Abstract

Psychosis poses substantial social and healthcare burdens. The analysis of speech is a promising approach for the diagnosis and monitoring of psychosis, capturing symptoms like thought disorder and flattened affect. Recent advancements in Natural Language Processing (NLP) methodologies enable the automated extraction of informative speech features, which has been leveraged for early psychosis detection and assessment of symptomology. However, critical gaps persist, including the absence of standardized sample collection protocols, small sample sizes, and a lack of multi-illness classification, limiting clinical applicability. Our study aimed to (1) identify an optimal assessment approach for the online and remote collection of speech, in the context of assessing the psychosis spectrum and evaluate whether a fully automated, speech-based machine learning (ML) pipeline can discriminate among different conditions on the schizophrenia-bipolar spectrum (SSD-BD-SPE), help-seeking comparison subjects (MDD), and healthy controls (HC) at varying layers of analysis and diagnostic complexity. We adopted online data collection methods to collect 20 min of speech and demographic information from individuals. Participants were categorized as "healthy" help-seekers (HC), having a schizophrenia-spectrum disorder (SSD), bipolar disorder (BD), major depressive disorder (MDD), or being on the psychosis spectrum with sub-clinical psychotic experiences (SPE). SPE status was determined based on self-reported clinical diagnosis and responses to the PHQ-8 and PQ-16 screening questionnaires, while other diagnoses were determined based on self-report from participants. Linguistic and paralinguistic features were extracted and ensemble learning algorithms (e.g., XGBoost) were used to train models. A 70-30% train-test split and 30-fold cross-validation was used to validate the model performance. The final analysis sample included 1140 individuals and 22,650 min of speech. Using 5 min of speech, our model could discriminate between HC and those with a serious mental illness (SSD or BD) with 86% accuracy (AUC = 0.91, Recall = 0.7, Precision = 0.98). Furthermore, our model could discern among HC, SPE, BD and SSD groups with 86% accuracy (F1 macro = 0.855, Recall Macro = 0.86, Precision Macro = 0.86). Finally, in a 5-class discrimination task including individuals with MDD, our model had 76% accuracy (F1 macro = 0.757, Recall Macro = 0.758, Precision Macro = 0.766). Our ML pipeline demonstrated disorder-specific learning, achieving excellent or good accuracy across several classification tasks. We demonstrated that the screening of mental disorders is possible via a fully automated, remote speech assessment pipeline. We tested our model on relatively high number conditions (5 classes) in the literature and in a stratified sample of psychosis spectrum, including HC, SPE, SSD and BD (4 classes). We tested our model on a large sample (N = 1150) and demonstrated best-in-class accuracy with remotely collected speech data in the psychosis spectrum, however, further clinical validation is needed to test the reliability of model performance.

查看原文本刊更多论文

从在线收集的5分钟演讲中检测精神分裂症、双相情感障碍、精神病易感性和重度抑郁症。

精神病造成了巨大的社会和医疗负担。言语分析是诊断和监测精神病的一种很有前途的方法，可以捕捉到思维障碍和情绪低落等症状。自然语言处理（NLP）方法的最新进展使信息语音特征的自动提取成为可能，这已被用于早期精神病检测和症状评估。然而，关键的差距仍然存在，包括缺乏标准化的样本收集方案，样本量小，缺乏多疾病分类，限制了临床适用性。我们的研究旨在(1)在评估精神病谱系的背景下，确定在线和远程语音收集的最佳评估方法，并评估全自动，基于语音的机器学习（ML）管道是否可以在不同的分析和诊断复杂性层面区分精神分裂症-双相谱系（SSD-BD-SPE），寻求帮助的比较受试者（MDD）和健康对照（HC）的不同条件。我们采用在线数据收集方法，收集个人20分钟的语音和人口统计信息。参与者被分类为“健康的”寻求帮助者（HC），患有精神分裂症谱系障碍（SSD），双相情感障碍（BD），重度抑郁症（MDD）或患有亚临床精神病经历的精神病谱系（SPE）。SPE状态是根据自我报告的临床诊断和对PHQ-8和PQ-16筛查问卷的回答来确定的，而其他诊断是根据参与者的自我报告来确定的。提取语言和副语言特征，并使用集成学习算法（如XGBoost）训练模型。采用70-30%训练检验分割和30倍交叉验证来验证模型的性能。最终的分析样本包括1140个人和22650分钟的讲话。使用5分钟的语音，我们的模型可以区分HC和严重精神疾病（SSD或BD）患者，准确率为86% （AUC = 0.91, Recall = 0.7, Precision = 0.98）。此外，我们的模型在HC、SPE、BD和SSD组之间的识别准确率为86% （F1宏= 0.855，Recall宏= 0.86，Precision宏= 0.86）。最后，在包含MDD个体的5类识别任务中，我们的模型具有76%的准确率（F1宏= 0.757，Recall宏= 0.758，Precision宏= 0.766）。我们的机器学习管道展示了针对特定障碍的学习，在多个分类任务中实现了出色或良好的准确性。我们证明了通过全自动远程语音评估管道筛选精神障碍是可能的。我们在文献中相对较多的条件（5类）和精神病谱系的分层样本中测试了我们的模型，包括HC， SPE， SSD和BD（4类）。我们在一个大样本（N = 1150）上测试了我们的模型，并在精神病谱系中远程收集的语音数据中证明了同类最佳的准确性，然而，需要进一步的临床验证来测试模型性能的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Translational Psychiatry PSYCHIATRY-

CiteScore

11.50

自引率

2.90%

发文量

484

审稿时长

23 weeks

期刊介绍： Psychiatry has suffered tremendously by the limited translational pipeline. Nobel laureate Julius Axelrod''s discovery in 1961 of monoamine reuptake by pre-synaptic neurons still forms the basis of contemporary antidepressant treatment. There is a grievous gap between the explosion of knowledge in neuroscience and conceptually novel treatments for our patients. Translational Psychiatry bridges this gap by fostering and highlighting the pathway from discovery to clinical applications, healthcare and global health. We view translation broadly as the full spectrum of work that marks the pathway from discovery to global health, inclusive. The steps of translation that are within the scope of Translational Psychiatry include (i) fundamental discovery, (ii) bench to bedside, (iii) bedside to clinical applications (clinical trials), (iv) translation to policy and health care guidelines, (v) assessment of health policy and usage, and (vi) global health. All areas of medical research, including — but not restricted to — molecular biology, genetics, pharmacology, imaging and epidemiology are welcome as they contribute to enhance the field of translational psychiatry.