Acoustic Characteristics of Emotional Speech Using Spectrogram Image Classification

Melissa Stola, M. Lech, R. Bolia, Michael Skinner
{"title":"Acoustic Characteristics of Emotional Speech Using Spectrogram Image Classification","authors":"Melissa Stola, M. Lech, R. Bolia, Michael Skinner","doi":"10.1109/ICSPCS.2018.8631752","DOIUrl":null,"url":null,"abstract":"One of the problems limiting the accuracy of speech emotion recognition (SER) is difficulty in the differentiation between acoustically-similar emotions. Since it is not clear how emotions differ in acoustic terms, it is difficult to design new, more efficient SER strategies. In this study, amplitude-frequency analysis of emotional speech was performed to determine relative differences between seven emotional categories of speech in the Berlin Emotional Speech (EMO-DB) database. The analysis transformed short J-second blocks of speech into RGB images of spectrograms using four different frequency scales. The images were used to train a convolutional neural network (CNN) to recognize emotions. By training the network with different combinations of frequency scales and color components of the RGB images that emphasized different frequency and spectral amplitude values, links between different emotions and corresponding amplitude-frequency characteristics of speech were determined.","PeriodicalId":179948,"journal":{"name":"2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPCS.2018.8631752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

One of the problems limiting the accuracy of speech emotion recognition (SER) is difficulty in the differentiation between acoustically-similar emotions. Since it is not clear how emotions differ in acoustic terms, it is difficult to design new, more efficient SER strategies. In this study, amplitude-frequency analysis of emotional speech was performed to determine relative differences between seven emotional categories of speech in the Berlin Emotional Speech (EMO-DB) database. The analysis transformed short J-second blocks of speech into RGB images of spectrograms using four different frequency scales. The images were used to train a convolutional neural network (CNN) to recognize emotions. By training the network with different combinations of frequency scales and color components of the RGB images that emphasized different frequency and spectral amplitude values, links between different emotions and corresponding amplitude-frequency characteristics of speech were determined.
基于谱图图像分类的情绪语音声学特征研究
语音情感识别的难点之一是难以区分声音相似的情感。由于尚不清楚情绪在声学方面是如何不同的,因此很难设计出新的、更有效的SER策略。本研究对情绪言语进行幅频分析,以确定柏林情绪言语数据库(EMO-DB)中七种情绪言语类别之间的相对差异。该分析将短的j秒语音块转换为使用四种不同频率尺度的RGB频谱图图像。这些图像被用来训练卷积神经网络(CNN)来识别情绪。通过对强调不同频率和谱幅值的RGB图像的不同频率尺度和颜色分量组合进行训练,确定不同情绪之间的联系以及相应的语音幅频特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信