High-arousal emotional speech enhances speech intelligibility and emotion recognition in noisea).

IF 2.1 2区 物理与天体物理 Q2 ACOUSTICS
Jessica M Alexander, Fernando Llanos
{"title":"High-arousal emotional speech enhances speech intelligibility and emotion recognition in noisea).","authors":"Jessica M Alexander, Fernando Llanos","doi":"10.1121/10.0036812","DOIUrl":null,"url":null,"abstract":"<p><p>Prosodic and voice quality modulations of the speech signal offer acoustic cues to the emotional state of the speaker. In quiet, listeners are highly adept at identifying not only a speaker's words but also the underlying emotional context. Given that distinct vocal emotions possess varying acoustic characteristics, background noise level may differentially impact speech recognition, emotion recognition, or their interaction. To investigate this question, we assessed the effects of three emotional speech styles (angry, happy, neutral) on speech intelligibility and emotion recognition across four different SNR levels. High-arousal emotional speech styles (happy and angry speech) enhanced both speech intelligibility and emotion recognition in noise. However, emotion recognition behavior was not a reliable predictor of speech recognition behavior. Instead, we found a strong correspondence between speech recognition scores and the relative power of the speech-in-noise signal in critical bands derived from the Speech Intelligibility Index. Unsupervised dimensional scaling analysis of emotion recognition patterns revealed that different noise baselines elicit different perceptual cue weighting strategies. Further dimensional scaling analysis revealed that emotion recognition patterns were best predicted by emotion-level differences in harmonic-to-noise ratio and variability around the fundamental frequency. Listeners may thus weight acoustic features differently for recognizing speech versus emotional patterns.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 6","pages":"4085-4096"},"PeriodicalIF":2.1000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0036812","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Prosodic and voice quality modulations of the speech signal offer acoustic cues to the emotional state of the speaker. In quiet, listeners are highly adept at identifying not only a speaker's words but also the underlying emotional context. Given that distinct vocal emotions possess varying acoustic characteristics, background noise level may differentially impact speech recognition, emotion recognition, or their interaction. To investigate this question, we assessed the effects of three emotional speech styles (angry, happy, neutral) on speech intelligibility and emotion recognition across four different SNR levels. High-arousal emotional speech styles (happy and angry speech) enhanced both speech intelligibility and emotion recognition in noise. However, emotion recognition behavior was not a reliable predictor of speech recognition behavior. Instead, we found a strong correspondence between speech recognition scores and the relative power of the speech-in-noise signal in critical bands derived from the Speech Intelligibility Index. Unsupervised dimensional scaling analysis of emotion recognition patterns revealed that different noise baselines elicit different perceptual cue weighting strategies. Further dimensional scaling analysis revealed that emotion recognition patterns were best predicted by emotion-level differences in harmonic-to-noise ratio and variability around the fundamental frequency. Listeners may thus weight acoustic features differently for recognizing speech versus emotional patterns.

高唤醒情绪言语可提高语音清晰度和噪声环境下的情绪识别能力。
语音信号的韵律和语音质量调制为说话者的情绪状态提供声学线索。在安静的情况下,听者不仅非常善于识别说话者的话语,而且还善于识别潜在的情感背景。鉴于不同的声音情绪具有不同的声学特性,背景噪声水平可能会不同地影响语音识别、情绪识别或它们的相互作用。为了研究这个问题,我们评估了三种情绪语言风格(愤怒、快乐、中性)在四种不同信噪比水平下对语音可理解性和情绪识别的影响。高唤醒情绪语言风格(快乐和愤怒的语言)在噪音环境下提高了语言的可理解性和情绪识别能力。然而,情绪识别行为并不是语音识别行为的可靠预测因子。相反,我们发现语音识别分数和语音噪声信号在关键波段的相对功率之间有很强的对应关系,这是由语音清晰度指数得出的。情绪识别模式的无监督维度标度分析表明,不同的噪声基线引发不同的感知线索加权策略。进一步的维度尺度分析表明,情绪识别模式最好通过情绪水平的谐波噪声比差异和基频周围的变异性来预测。因此,听众在识别语音和情感模式时,可能会对声学特征的权重有所不同。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.60
自引率
16.70%
发文量
1433
审稿时长
4.7 months
期刊介绍: Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信