Time Window Analysis for Automatic Speech Emotion Recognition

2018 International Symposium ELMAR Pub Date : 2018-09-01 DOI:10.23919/ELMAR.2018.8534630

Boris Puterka, J. Kacur

引用次数: 8

Abstract

In this paper we present time analysis results of speech emotion recognition using convolutional neural network architecture and spectrograms as a speech features. Analyses were performed on model with two convolutional layers followed by pooling layer, and one fully-connected layer followed by dropout and softmax layer on the output. On this model we analyzed time characteristics of speech signal represented by spectrograms. The aim of our work was to find relation between duration of speech signal and the recognition rate of seven basic emotions. It was discovered that speech length is important and naturally the accuracy is growing with the length of analyzed window, however over approximately 1.2 seconds the growth becomes rather mild.

查看原文本刊更多论文

语音情感自动识别的时间窗分析

本文给出了用卷积神经网络结构和频谱图作为语音特征的语音情感识别的时间分析结果。对两个卷积层后池化层，一个全连接层后输出dropout和softmax层的模型进行分析。在此模型上，我们分析了用谱图表示的语音信号的时间特征。我们的工作目的是找出语音信号的持续时间与七种基本情绪的识别率之间的关系。我们发现，语音长度很重要，准确度自然会随着分析窗口的长度而增长，但在大约1.2秒后，增长变得相当温和。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 International Symposium ELMAR

自引率

0.00%

发文量