语音情感识别中调制谱特征的噪声-混响-鲁棒性研究

Taiyang Guo, Sixia Li, M. Unoki, S. Okada
{"title":"语音情感识别中调制谱特征的噪声-混响-鲁棒性研究","authors":"Taiyang Guo, Sixia Li, M. Unoki, S. Okada","doi":"10.23919/APSIPAASC55919.2022.9980032","DOIUrl":null,"url":null,"abstract":"Speech-emotion recognition (SER) in noisy reverber-ant environments is a fundamental technique for real-world ap-plications, including call center service and psychological disease diagnosis. However, in daily auditory environments with noise and reverberation, previous studies using acoustic features could not achieve the same emotion-recognition rates as in an ideal experimental environment (with no noise and no reverberation). To remedy this imperfection, it is necessary to find robust features against noise and reverberation for SER. However, it has been proved that a daily noisy reverberant environment (signal-to-noise ratio is greater than 10 dB and reverberation time is less than 1.0 s) does not affect humans' vocal-emotion recognition. On the basis of the auditory system of human perception, previous research proposed modulation spectral features (MSFs) that contribute to vocal-emotion recognition by humans. Using MSFs has the potential to improve SER in noisy reverberant environments. We investigated the effectiveness and robustness of MSFs for SER in noisy reverberant environments. We used noise-vocoded speech, which is synthesized speech that retains emotional components of speech signals in noisy reverberant environments as speech data. We also used a support vector machine as the classifier to carry out emotion recognition. The experimental results indicate that compared with two widely used feature sets, using MSFs improved the recognition accuracy in 13 of the 26 environments with an average improvement of 11.38%. Thus, MSFs contribute to SER and are robust against noise and reverberation.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Investigation of noise-reverberation-robustness of modulation spectral features for speech-emotion recognition\",\"authors\":\"Taiyang Guo, Sixia Li, M. Unoki, S. Okada\",\"doi\":\"10.23919/APSIPAASC55919.2022.9980032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech-emotion recognition (SER) in noisy reverber-ant environments is a fundamental technique for real-world ap-plications, including call center service and psychological disease diagnosis. However, in daily auditory environments with noise and reverberation, previous studies using acoustic features could not achieve the same emotion-recognition rates as in an ideal experimental environment (with no noise and no reverberation). To remedy this imperfection, it is necessary to find robust features against noise and reverberation for SER. However, it has been proved that a daily noisy reverberant environment (signal-to-noise ratio is greater than 10 dB and reverberation time is less than 1.0 s) does not affect humans' vocal-emotion recognition. On the basis of the auditory system of human perception, previous research proposed modulation spectral features (MSFs) that contribute to vocal-emotion recognition by humans. Using MSFs has the potential to improve SER in noisy reverberant environments. We investigated the effectiveness and robustness of MSFs for SER in noisy reverberant environments. We used noise-vocoded speech, which is synthesized speech that retains emotional components of speech signals in noisy reverberant environments as speech data. We also used a support vector machine as the classifier to carry out emotion recognition. The experimental results indicate that compared with two widely used feature sets, using MSFs improved the recognition accuracy in 13 of the 26 environments with an average improvement of 11.38%. Thus, MSFs contribute to SER and are robust against noise and reverberation.\",\"PeriodicalId\":382967,\"journal\":{\"name\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPAASC55919.2022.9980032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9980032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

嘈杂混响环境下的语音情感识别(SER)是一项现实应用的基础技术,包括呼叫中心服务和心理疾病诊断。然而,在具有噪声和混响的日常听觉环境中,以往利用声学特征的研究无法达到与理想实验环境(无噪声和无混响)相同的情绪识别率。为了弥补这一缺陷,有必要为SER找到抗噪声和混响的健壮特性。然而,已经证明,日常嘈杂的混响环境(信噪比大于10 dB,混响时间小于1.0 s)并不影响人类的声音情感识别。在人类感知听觉系统的基础上,已有研究提出了调制谱特征(MSFs),该特征有助于人类对声音-情绪的识别。使用msf有可能改善嘈杂混响环境中的SER。我们研究了msf在嘈杂混响环境中对SER的有效性和鲁棒性。我们使用了噪声语音编码语音,这是一种合成语音,在嘈杂的混响环境中保留语音信号的情感成分作为语音数据。我们还使用支持向量机作为分类器进行情感识别。实验结果表明,与两种广泛使用的特征集相比,msf在26个环境中的13个环境中提高了识别精度,平均提高了11.38%。因此,msf有助于SER,并且对噪声和混响具有鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Investigation of noise-reverberation-robustness of modulation spectral features for speech-emotion recognition
Speech-emotion recognition (SER) in noisy reverber-ant environments is a fundamental technique for real-world ap-plications, including call center service and psychological disease diagnosis. However, in daily auditory environments with noise and reverberation, previous studies using acoustic features could not achieve the same emotion-recognition rates as in an ideal experimental environment (with no noise and no reverberation). To remedy this imperfection, it is necessary to find robust features against noise and reverberation for SER. However, it has been proved that a daily noisy reverberant environment (signal-to-noise ratio is greater than 10 dB and reverberation time is less than 1.0 s) does not affect humans' vocal-emotion recognition. On the basis of the auditory system of human perception, previous research proposed modulation spectral features (MSFs) that contribute to vocal-emotion recognition by humans. Using MSFs has the potential to improve SER in noisy reverberant environments. We investigated the effectiveness and robustness of MSFs for SER in noisy reverberant environments. We used noise-vocoded speech, which is synthesized speech that retains emotional components of speech signals in noisy reverberant environments as speech data. We also used a support vector machine as the classifier to carry out emotion recognition. The experimental results indicate that compared with two widely used feature sets, using MSFs improved the recognition accuracy in 13 of the 26 environments with an average improvement of 11.38%. Thus, MSFs contribute to SER and are robust against noise and reverberation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信