Dimensional Speech Emotion Recognition from Acoustic and Text Features using Recurrent Neural Networks

Bagus Tris Atmaja, Reda Elbarougy, M. Akagi
{"title":"Dimensional Speech Emotion Recognition from Acoustic and Text Features using Recurrent Neural Networks","authors":"Bagus Tris Atmaja, Reda Elbarougy, M. Akagi","doi":"10.34010/injiiscom.v1i1.4023","DOIUrl":null,"url":null,"abstract":"Emotion can be inferred from tonal and verbal information, where both features can be extracted from speech. While most researchers conducted studies on categorical emotion recognition from a single modality, this research presents a dimensional emotion recognition combining acoustic and text features. A number of 31 acoustic features are extracted from speech, while word vector is used as text features. The initial result on single modality emotion recognition can be used as a cue to combine both features with improving the recognition result. The latter result shows that a combination of acoustic and text features decreases the error of dimensional emotion score prediction by about 5% from the acoustic system and 1% from the text system. This smallest error is achieved by combining the text system with Long Short-Term Memory (LSTM) networks and acoustic systems with bidirectional LSTM networks and concatenated both systems with dense networks","PeriodicalId":196635,"journal":{"name":"International Journal of Informatics, Information System and Computer Engineering (INJIISCOM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Informatics, Information System and Computer Engineering (INJIISCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34010/injiiscom.v1i1.4023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Emotion can be inferred from tonal and verbal information, where both features can be extracted from speech. While most researchers conducted studies on categorical emotion recognition from a single modality, this research presents a dimensional emotion recognition combining acoustic and text features. A number of 31 acoustic features are extracted from speech, while word vector is used as text features. The initial result on single modality emotion recognition can be used as a cue to combine both features with improving the recognition result. The latter result shows that a combination of acoustic and text features decreases the error of dimensional emotion score prediction by about 5% from the acoustic system and 1% from the text system. This smallest error is achieved by combining the text system with Long Short-Term Memory (LSTM) networks and acoustic systems with bidirectional LSTM networks and concatenated both systems with dense networks
基于声学和文本特征的递归神经网络的多维语音情感识别
情感可以从音调和语言信息中推断出来,这两种特征都可以从语音中提取出来。大多数研究者都是从单一模态进行分类情感识别的研究,而本研究提出了一种结合声音和文本特征的维度情感识别。从语音中提取31个声学特征,使用词向量作为文本特征。单模态情感识别的初步结果可以作为线索,将两种特征结合起来,提高识别结果。后者的结果表明,声学和文本特征的结合使维度情感评分预测的误差比声学系统降低了约5%,比文本系统降低了约1%。通过将文本系统与长短期记忆(LSTM)网络结合,将声学系统与双向LSTM网络结合,并将这两个系统与密集网络连接起来,可以实现最小的误差
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信