听觉识别精神分裂症:基于深度学习的情感特征融合语音判别分析。

IF 3.4 2区 医学 Q2 PSYCHIATRY
Jie Huang, Yanli Zhao, Zhanxiao Tian, Wei Qu, Xia Du, Jie Zhang, Meng Zhang, Yunlong Tan, Zhiren Wang, Shuping Tan
{"title":"听觉识别精神分裂症:基于深度学习的情感特征融合语音判别分析。","authors":"Jie Huang, Yanli Zhao, Zhanxiao Tian, Wei Qu, Xia Du, Jie Zhang, Meng Zhang, Yunlong Tan, Zhiren Wang, Shuping Tan","doi":"10.1186/s12888-025-06888-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objective: </strong>Accurate detection of schizophrenia poses a grand challenge as a complex and heterogeneous mental disorder. Current diagnostic criteria rely primarily on clinical symptoms, which may not fully capture individual differences and the heterogeneity of the disorder. In this study, a discriminative model of schizophrenic speech based on deep learning is developed, which combines different emotional stimuli and features.</p><p><strong>Methods: </strong>A total of 156 schizophrenia patients and 74 healthy controls participated in the study, reading three fixed texts with varying emotional stimuli. The log-Mel spectrogram and Mel-frequency cepstral coefficients (MFCCs) were extracted using the librosa-0.9.2 toolkit. Convolutional neural networks were applied to analyze the log-Mel spectrogram. The effects of different emotional stimuli and the fusion of demographic information and MFCCs on schizophrenia detection were examined.</p><p><strong>Results: </strong>The discriminant analysis results showed superior performance for neutral emotional stimuli compared to positive and negative stimuli. Integrating different emotional stimuli and fusing features with personal information improved sensitivity and specificity. The best discriminant model achieved an accuracy of 91.7%, sensitivity of 94.9%, specificity of 85.1%, and ROC-AUC of 0.963.</p><p><strong>Conclusions: </strong>Speech analysis under neutral emotional stimulation demonstrated greater differences between schizophrenia patients and healthy controls, enhancing discriminative analysis of schizophrenia. Integrating different emotions, demographic information and MFCCs improved the accuracy of schizophrenia detection. This study provides a methodological foundation for constructing a personalized speech detection model for schizophrenia.</p>","PeriodicalId":9029,"journal":{"name":"BMC Psychiatry","volume":"25 1","pages":"466"},"PeriodicalIF":3.4000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12060412/pdf/","citationCount":"0","resultStr":"{\"title\":\"Hearing vocals to recognize schizophrenia: speech discriminant analysis with fusion of emotions and features based on deep learning.\",\"authors\":\"Jie Huang, Yanli Zhao, Zhanxiao Tian, Wei Qu, Xia Du, Jie Zhang, Meng Zhang, Yunlong Tan, Zhiren Wang, Shuping Tan\",\"doi\":\"10.1186/s12888-025-06888-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and objective: </strong>Accurate detection of schizophrenia poses a grand challenge as a complex and heterogeneous mental disorder. Current diagnostic criteria rely primarily on clinical symptoms, which may not fully capture individual differences and the heterogeneity of the disorder. In this study, a discriminative model of schizophrenic speech based on deep learning is developed, which combines different emotional stimuli and features.</p><p><strong>Methods: </strong>A total of 156 schizophrenia patients and 74 healthy controls participated in the study, reading three fixed texts with varying emotional stimuli. The log-Mel spectrogram and Mel-frequency cepstral coefficients (MFCCs) were extracted using the librosa-0.9.2 toolkit. Convolutional neural networks were applied to analyze the log-Mel spectrogram. The effects of different emotional stimuli and the fusion of demographic information and MFCCs on schizophrenia detection were examined.</p><p><strong>Results: </strong>The discriminant analysis results showed superior performance for neutral emotional stimuli compared to positive and negative stimuli. Integrating different emotional stimuli and fusing features with personal information improved sensitivity and specificity. The best discriminant model achieved an accuracy of 91.7%, sensitivity of 94.9%, specificity of 85.1%, and ROC-AUC of 0.963.</p><p><strong>Conclusions: </strong>Speech analysis under neutral emotional stimulation demonstrated greater differences between schizophrenia patients and healthy controls, enhancing discriminative analysis of schizophrenia. Integrating different emotions, demographic information and MFCCs improved the accuracy of schizophrenia detection. This study provides a methodological foundation for constructing a personalized speech detection model for schizophrenia.</p>\",\"PeriodicalId\":9029,\"journal\":{\"name\":\"BMC Psychiatry\",\"volume\":\"25 1\",\"pages\":\"466\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12060412/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Psychiatry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12888-025-06888-z\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Psychiatry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12888-025-06888-z","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

摘要

背景与目的:精神分裂症作为一种复杂的异质精神障碍,其准确的诊断提出了巨大的挑战。目前的诊断标准主要依赖于临床症状,这可能不能完全反映个体差异和疾病的异质性。本研究建立了一种基于深度学习的精神分裂症语音识别模型,该模型结合了不同的情绪刺激和特征。方法:共156名精神分裂症患者和74名健康对照者参与研究,阅读三种不同情绪刺激的固定文本。使用librosa-0.9.2工具包提取对数mel谱图和mel频率倒谱系数(MFCCs)。采用卷积神经网络对对数-梅尔谱图进行分析。研究了不同情绪刺激以及人口统计信息和mfccc融合对精神分裂症检测的影响。结果:判别分析结果显示,中性情绪刺激比积极情绪刺激和消极情绪刺激有更好的表现。整合不同的情绪刺激,将特征与个人信息融合,提高了敏感性和特异性。最佳判别模型准确率为91.7%,灵敏度为94.9%,特异性为85.1%,ROC-AUC为0.963。结论:中性情绪刺激下的言语分析在精神分裂症患者和健康对照组之间存在较大差异,增强了精神分裂症的判别分析。整合不同情绪、人口统计信息和mfccc提高了精神分裂症检测的准确性。本研究为构建个性化的精神分裂症语音检测模型提供了方法学基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hearing vocals to recognize schizophrenia: speech discriminant analysis with fusion of emotions and features based on deep learning.

Background and objective: Accurate detection of schizophrenia poses a grand challenge as a complex and heterogeneous mental disorder. Current diagnostic criteria rely primarily on clinical symptoms, which may not fully capture individual differences and the heterogeneity of the disorder. In this study, a discriminative model of schizophrenic speech based on deep learning is developed, which combines different emotional stimuli and features.

Methods: A total of 156 schizophrenia patients and 74 healthy controls participated in the study, reading three fixed texts with varying emotional stimuli. The log-Mel spectrogram and Mel-frequency cepstral coefficients (MFCCs) were extracted using the librosa-0.9.2 toolkit. Convolutional neural networks were applied to analyze the log-Mel spectrogram. The effects of different emotional stimuli and the fusion of demographic information and MFCCs on schizophrenia detection were examined.

Results: The discriminant analysis results showed superior performance for neutral emotional stimuli compared to positive and negative stimuli. Integrating different emotional stimuli and fusing features with personal information improved sensitivity and specificity. The best discriminant model achieved an accuracy of 91.7%, sensitivity of 94.9%, specificity of 85.1%, and ROC-AUC of 0.963.

Conclusions: Speech analysis under neutral emotional stimulation demonstrated greater differences between schizophrenia patients and healthy controls, enhancing discriminative analysis of schizophrenia. Integrating different emotions, demographic information and MFCCs improved the accuracy of schizophrenia detection. This study provides a methodological foundation for constructing a personalized speech detection model for schizophrenia.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Psychiatry
BMC Psychiatry 医学-精神病学
CiteScore
5.90
自引率
4.50%
发文量
716
审稿时长
3-6 weeks
期刊介绍: BMC Psychiatry is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of psychiatric disorders, as well as related molecular genetics, pathophysiology, and epidemiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信