Vocal Markers of Schizophrenia: Assessing the Generalizability of Machine Learning Models and Their Clinical Applicability.

IF 4.8 1区医学 Q1 PSYCHIATRY

Schizophrenia Bulletin Pub Date : 2025-08-18 DOI:10.1093/schbul/sbaf124

Alberto Parola, Emil Trenckner Jessen, Astrid Rybner, Marie Damsgaard Mortensen, Stine Nyhus Larsen, Arndis Simonsen, Jessica Mary Lin, Yuan Zhou, Huiling Wang, Katja Koelkebeck, Konstantinos Sechidis, Vibeke Bliksted, Riccardo Fusaroli

{"title":"Vocal Markers of Schizophrenia: Assessing the Generalizability of Machine Learning Models and Their Clinical Applicability.","authors":"Alberto Parola, Emil Trenckner Jessen, Astrid Rybner, Marie Damsgaard Mortensen, Stine Nyhus Larsen, Arndis Simonsen, Jessica Mary Lin, Yuan Zhou, Huiling Wang, Katja Koelkebeck, Konstantinos Sechidis, Vibeke Bliksted, Riccardo Fusaroli","doi":"10.1093/schbul/sbaf124","DOIUrl":null,"url":null,"abstract":"Background and hypothesis: Machine learning (ML) models have been argued to reliably predict diagnosis and symptoms of schizophrenia based on voice data only. However, it is unclear to what extent such ML markers would generalize to different clinical samples and different languages, a crucial assessment to move toward clinical applicability. In this study, we systematically assessed the generalizability of current ML models of vocal markers of schizophrenia across contexts and languages.Study design: We trained models relying on a large cross-linguistic dataset (Danish, German, Chinese) of 217 patients with schizophrenia and 221 controls, and used a conservative pipeline to minimize overfitting. We tested the models' generalizability on: (Q1) new participants, speaking the same language; (Q2) new participants, speaking a different language; (Q3-Q4) further, we assessed whether training on data with multiple languages would improve generalizability using Mixture of Expert (MoE) and multilingual models.Results: Model performance was comparable to state-of-the-art findings (F1-score ~0.75) within the same language; however, models did not generalize well-showing a substantial decrease-when tested on new languages. The performance of MoE and multilingual models was generally low (F1-score ~0.50).Conclusions: Overall, the cross-linguistic generalizability of vocal markers of schizophrenia is limited. We argue that more emphasis should be placed on collecting large open cross-linguistic datasets to systematically test the generalizability of voice-based ML models, and on identifying more precise mechanisms of how the clinical features of schizophrenia are expressed in language and voice, and how different languages vary in that expression.","PeriodicalId":21530,"journal":{"name":"Schizophrenia Bulletin","volume":" ","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Schizophrenia Bulletin","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/schbul/sbaf124","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}

引用次数: 0

Abstract

Background and hypothesis: Machine learning (ML) models have been argued to reliably predict diagnosis and symptoms of schizophrenia based on voice data only. However, it is unclear to what extent such ML markers would generalize to different clinical samples and different languages, a crucial assessment to move toward clinical applicability. In this study, we systematically assessed the generalizability of current ML models of vocal markers of schizophrenia across contexts and languages.

Study design: We trained models relying on a large cross-linguistic dataset (Danish, German, Chinese) of 217 patients with schizophrenia and 221 controls, and used a conservative pipeline to minimize overfitting. We tested the models' generalizability on: (Q1) new participants, speaking the same language; (Q2) new participants, speaking a different language; (Q3-Q4) further, we assessed whether training on data with multiple languages would improve generalizability using Mixture of Expert (MoE) and multilingual models.

Results: Model performance was comparable to state-of-the-art findings (F1-score ~0.75) within the same language; however, models did not generalize well-showing a substantial decrease-when tested on new languages. The performance of MoE and multilingual models was generally low (F1-score ~0.50).

Conclusions: Overall, the cross-linguistic generalizability of vocal markers of schizophrenia is limited. We argue that more emphasis should be placed on collecting large open cross-linguistic datasets to systematically test the generalizability of voice-based ML models, and on identifying more precise mechanisms of how the clinical features of schizophrenia are expressed in language and voice, and how different languages vary in that expression.

查看原文本刊更多论文

精神分裂症的声音标记：评估机器学习模型的通用性及其临床适用性。

背景和假设：人们认为机器学习（ML）模型仅基于语音数据就能可靠地预测精神分裂症的诊断和症状。然而，尚不清楚这种ML标记物在多大程度上可以推广到不同的临床样本和不同的语言，这是走向临床适用性的关键评估。在这项研究中，我们系统地评估了当前精神分裂症语音标记的ML模型在不同语境和语言中的普遍性。研究设计：我们基于217名精神分裂症患者和221名对照组的大型跨语言数据集（丹麦语、德语、中文）训练模型，并使用保守管道最小化过拟合。我们对模型的普遍性进行了测试：(1)说同一种语言的新参与者；（二）新参与者，说不同的语言；（Q3-Q4）进一步，我们评估了使用混合专家（MoE）和多语言模型对多语言数据进行训练是否会提高泛化性。结果：在同一语言中，模型的表现与最先进的研究结果相当（f1得分~0.75）；然而，在对新语言进行测试时，模型并没有很好地泛化——表现出明显的下降。MoE和多语言模型的表现普遍较低（F1-score ~0.50）。结论：总的来说，精神分裂症的声音标记的跨语言泛化性是有限的。我们认为，应该更加重视收集大型开放的跨语言数据集，以系统地测试基于语音的ML模型的可泛化性，并确定精神分裂症的临床特征如何以语言和语音表达的更精确机制，以及不同语言在这种表达中的差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Schizophrenia Bulletin 医学-精神病学

CiteScore

11.40

自引率

6.10%

发文量

163

审稿时长

4-8 weeks

期刊介绍： Schizophrenia Bulletin seeks to review recent developments and empirically based hypotheses regarding the etiology and treatment of schizophrenia. We view the field as broad and deep, and will publish new knowledge ranging from the molecular basis to social and cultural factors. We will give new emphasis to translational reports which simultaneously highlight basic neurobiological mechanisms and clinical manifestations. Some of the Bulletin content is invited as special features or manuscripts organized as a theme by special guest editors. Most pages of the Bulletin are devoted to unsolicited manuscripts of high quality that report original data or where we can provide a special venue for a major study or workshop report. Supplement issues are sometimes provided for manuscripts reporting from a recent conference.