从韵律语音特征预测情感投入和精神紧张。

IF 3.2 3区 医学 Q2 PSYCHIATRY
Frontiers in Psychiatry Pub Date : 2025-09-19 eCollection Date: 2025-01-01 DOI:10.3389/fpsyt.2025.1656292
Vaishnavi Prakash Yache, Laura Moradbakhti, Irene Neuner, Tanja Veselinovic
{"title":"从韵律语音特征预测情感投入和精神紧张。","authors":"Vaishnavi Prakash Yache, Laura Moradbakhti, Irene Neuner, Tanja Veselinovic","doi":"10.3389/fpsyt.2025.1656292","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Emotional resilience (traditionally defined as the capacity to recover from adversity) and cognitive load (the mental effort for processing information) are critical aspects of mental health functioning. Traditional assessment methods, such as physiological sensors and post-task surveys, often disrupt natural behavior and fail to provide real-time insights. Speech prosody, encompassing pitch, intensity, loudness, and voice activity, offer a non-intrusive alternative for evaluating these psychological constructs. However, the relationship between speech prosody, emotional resilience, and cognitive load remains underexplored, particularly in conversational contexts.</p><p><strong>Objective: </strong>This study proposes proxy measures for these constructs based on self-reported engagement, enjoyment, boredom, and cognitive effort during dyadic conversation. By leveraging the SEWA (Automatic Sentiment Estimation in the Wild) database, developed through a European research project on emotion recognition, the research seeks to develop machine learning models that correlate speech patterns with subjective self-reports of emotional and cognitive states.</p><p><strong>Methods: </strong>Prosodic features, such as pitch variation, vocal intensity, and voice activity, were extracted from the SEWA database recordings. These features are then normalized to account for inter-speaker variability and used as predictors in machine learning models. Regression and classification models are employed to correlate speech features with subjective self-reports, which serve as ground truth for Positive Affective Engagement (as a proxy for emotional resilience) and Perceived Mental Strain (as a proxy for cognitive load). Data from English and German speakers are analyzed separately to account for linguistic and cultural differences.</p><p><strong>Outcomes: </strong>The study establishes a significant relationship between speech prosody and psychological states, demonstrating that Positive Affective Engagement (as a proxy for emotional resilience) and Perceived Mental Strain (as a proxy for cognitive load) can be effectively predicted through prosodic features. Higher emotional resilience is linked to more discernible prosodic patterns in German speech, such as higher loudness and greater voice probability consistency. In contrast, cognitive load prediction remains consistent across English and German datasets.</p><p><strong>Conclusion: </strong>This research introduces a novel approach for assessing Positive Affective Engagement (as a proxy for emotional resilience) and Perceived Mental Strain (as a proxy for cognitive load) through speech prosody, highlighting the significant impact of language-specific variations. By combining prosodic features with machine learning techniques, the study offers a promising alternative to traditional psychological assessments. The findings emphasize the need for tailored, multilingual models to accurately estimate psychological states, with potential applications in mental health monitoring, cognitive workload analysis, and human-computer interaction. This work lays the foundation for future innovations in speech-based psychological profiling, advancing our understanding of human emotional and cognitive states in diverse linguistic contexts.</p>","PeriodicalId":12605,"journal":{"name":"Frontiers in Psychiatry","volume":"16 ","pages":"1656292"},"PeriodicalIF":3.2000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12491004/pdf/","citationCount":"0","resultStr":"{\"title\":\"Predicting affective engagement and mental strain from prosodic speech features.\",\"authors\":\"Vaishnavi Prakash Yache, Laura Moradbakhti, Irene Neuner, Tanja Veselinovic\",\"doi\":\"10.3389/fpsyt.2025.1656292\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Emotional resilience (traditionally defined as the capacity to recover from adversity) and cognitive load (the mental effort for processing information) are critical aspects of mental health functioning. Traditional assessment methods, such as physiological sensors and post-task surveys, often disrupt natural behavior and fail to provide real-time insights. Speech prosody, encompassing pitch, intensity, loudness, and voice activity, offer a non-intrusive alternative for evaluating these psychological constructs. However, the relationship between speech prosody, emotional resilience, and cognitive load remains underexplored, particularly in conversational contexts.</p><p><strong>Objective: </strong>This study proposes proxy measures for these constructs based on self-reported engagement, enjoyment, boredom, and cognitive effort during dyadic conversation. By leveraging the SEWA (Automatic Sentiment Estimation in the Wild) database, developed through a European research project on emotion recognition, the research seeks to develop machine learning models that correlate speech patterns with subjective self-reports of emotional and cognitive states.</p><p><strong>Methods: </strong>Prosodic features, such as pitch variation, vocal intensity, and voice activity, were extracted from the SEWA database recordings. These features are then normalized to account for inter-speaker variability and used as predictors in machine learning models. Regression and classification models are employed to correlate speech features with subjective self-reports, which serve as ground truth for Positive Affective Engagement (as a proxy for emotional resilience) and Perceived Mental Strain (as a proxy for cognitive load). Data from English and German speakers are analyzed separately to account for linguistic and cultural differences.</p><p><strong>Outcomes: </strong>The study establishes a significant relationship between speech prosody and psychological states, demonstrating that Positive Affective Engagement (as a proxy for emotional resilience) and Perceived Mental Strain (as a proxy for cognitive load) can be effectively predicted through prosodic features. Higher emotional resilience is linked to more discernible prosodic patterns in German speech, such as higher loudness and greater voice probability consistency. In contrast, cognitive load prediction remains consistent across English and German datasets.</p><p><strong>Conclusion: </strong>This research introduces a novel approach for assessing Positive Affective Engagement (as a proxy for emotional resilience) and Perceived Mental Strain (as a proxy for cognitive load) through speech prosody, highlighting the significant impact of language-specific variations. By combining prosodic features with machine learning techniques, the study offers a promising alternative to traditional psychological assessments. The findings emphasize the need for tailored, multilingual models to accurately estimate psychological states, with potential applications in mental health monitoring, cognitive workload analysis, and human-computer interaction. This work lays the foundation for future innovations in speech-based psychological profiling, advancing our understanding of human emotional and cognitive states in diverse linguistic contexts.</p>\",\"PeriodicalId\":12605,\"journal\":{\"name\":\"Frontiers in Psychiatry\",\"volume\":\"16 \",\"pages\":\"1656292\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12491004/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Psychiatry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3389/fpsyt.2025.1656292\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Psychiatry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fpsyt.2025.1656292","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

摘要

背景:情绪弹性(传统上定义为从逆境中恢复的能力)和认知负荷(处理信息的心理努力)是心理健康功能的关键方面。传统的评估方法,如生理传感器和任务后调查,往往会破坏自然行为,无法提供实时的见解。语音韵律,包括音高、强度、响度和声音活动,为评估这些心理结构提供了一种非侵入性的选择。然而,言语韵律、情绪弹性和认知负荷之间的关系仍未得到充分探讨,特别是在会话环境中。目的:本研究提出了这些构念的代理测量方法,基于自我报告的参与、享受、无聊和二元对话中的认知努力。通过欧洲情感识别研究项目开发的SEWA(野外自动情绪估计)数据库,该研究旨在开发机器学习模型,将语音模式与情绪和认知状态的主观自我报告联系起来。方法:从SEWA数据库中提取音高变化、声音强度和声音活动等韵律特征。然后将这些特征归一化,以解释说话者之间的可变性,并将其用作机器学习模型中的预测因子。使用回归和分类模型将语音特征与主观自我报告联系起来,这是积极情感投入(作为情绪弹性的代理)和感知精神紧张(作为认知负荷的代理)的基本事实。来自英语和德语使用者的数据被分开分析,以解释语言和文化差异。结果:语音韵律与心理状态之间存在显著的关系,表明积极情感投入(情绪弹性的代表)和感知精神紧张(认知负荷的代表)可以通过韵律特征有效地预测。较高的情绪弹性与德语中更明显的韵律模式有关,比如更高的音量和更大的语音概率一致性。相比之下,认知负荷预测在英语和德语数据集中保持一致。结论:本研究提出了一种通过语音韵律来评估积极情感投入(作为情绪弹性的代理)和感知精神紧张(作为认知负荷的代理)的新方法,突出了语言特定变化的显著影响。通过将韵律特征与机器学习技术相结合,该研究为传统的心理评估提供了一个有希望的替代方案。研究结果强调需要量身定制的多语言模型来准确估计心理状态,在心理健康监测、认知工作量分析和人机交互方面具有潜在的应用前景。这项工作为未来基于语音的心理分析的创新奠定了基础,促进了我们对不同语言背景下人类情感和认知状态的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Predicting affective engagement and mental strain from prosodic speech features.

Predicting affective engagement and mental strain from prosodic speech features.

Predicting affective engagement and mental strain from prosodic speech features.

Predicting affective engagement and mental strain from prosodic speech features.

Background: Emotional resilience (traditionally defined as the capacity to recover from adversity) and cognitive load (the mental effort for processing information) are critical aspects of mental health functioning. Traditional assessment methods, such as physiological sensors and post-task surveys, often disrupt natural behavior and fail to provide real-time insights. Speech prosody, encompassing pitch, intensity, loudness, and voice activity, offer a non-intrusive alternative for evaluating these psychological constructs. However, the relationship between speech prosody, emotional resilience, and cognitive load remains underexplored, particularly in conversational contexts.

Objective: This study proposes proxy measures for these constructs based on self-reported engagement, enjoyment, boredom, and cognitive effort during dyadic conversation. By leveraging the SEWA (Automatic Sentiment Estimation in the Wild) database, developed through a European research project on emotion recognition, the research seeks to develop machine learning models that correlate speech patterns with subjective self-reports of emotional and cognitive states.

Methods: Prosodic features, such as pitch variation, vocal intensity, and voice activity, were extracted from the SEWA database recordings. These features are then normalized to account for inter-speaker variability and used as predictors in machine learning models. Regression and classification models are employed to correlate speech features with subjective self-reports, which serve as ground truth for Positive Affective Engagement (as a proxy for emotional resilience) and Perceived Mental Strain (as a proxy for cognitive load). Data from English and German speakers are analyzed separately to account for linguistic and cultural differences.

Outcomes: The study establishes a significant relationship between speech prosody and psychological states, demonstrating that Positive Affective Engagement (as a proxy for emotional resilience) and Perceived Mental Strain (as a proxy for cognitive load) can be effectively predicted through prosodic features. Higher emotional resilience is linked to more discernible prosodic patterns in German speech, such as higher loudness and greater voice probability consistency. In contrast, cognitive load prediction remains consistent across English and German datasets.

Conclusion: This research introduces a novel approach for assessing Positive Affective Engagement (as a proxy for emotional resilience) and Perceived Mental Strain (as a proxy for cognitive load) through speech prosody, highlighting the significant impact of language-specific variations. By combining prosodic features with machine learning techniques, the study offers a promising alternative to traditional psychological assessments. The findings emphasize the need for tailored, multilingual models to accurately estimate psychological states, with potential applications in mental health monitoring, cognitive workload analysis, and human-computer interaction. This work lays the foundation for future innovations in speech-based psychological profiling, advancing our understanding of human emotional and cognitive states in diverse linguistic contexts.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Frontiers in Psychiatry
Frontiers in Psychiatry Medicine-Psychiatry and Mental Health
CiteScore
6.20
自引率
8.50%
发文量
2813
审稿时长
14 weeks
期刊介绍: Frontiers in Psychiatry publishes rigorously peer-reviewed research across a wide spectrum of translational, basic and clinical research. Field Chief Editor Stefan Borgwardt at the University of Basel is supported by an outstanding Editorial Board of international researchers. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians and the public worldwide. The journal''s mission is to use translational approaches to improve therapeutic options for mental illness and consequently to improve patient treatment outcomes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信