Speech Emotion Recognition Based on Acoustic Segment Model

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI:10.1109/ISCSLP49672.2021.9362119

Siyuan Zheng, Jun Du, Hengshun Zhou, Xue Bai, Chin-Hui Lee, Shipeng Li

引用次数: 3

Abstract

Accurate detection of emotion from speech is a challenging task due to the variability in speech and emotion. In this paper, we propose a speech emotion recognition (SER) method based on acoustic segment model (ASM) to deal with this issue. Specifically, speech with different emotions is segmented more finely by ASM. Each of these acoustic segments is modeled by Hidden Markov Models (HMMs) and decoded into a series of ASM sequences in an unsupervised way. Then feature vectors are obtained from these sequences above by latent semantic analysis (LSA). Finally, these feature vectors are fed to a classifier. Validated on the IEMOCAP corpus, results demonstrate the proposed method outperforms the state-of-the-art methods with a weighted accuracy of 73.9% and an unweighted accuracy of 70.8% respectively.

查看原文本刊更多论文

基于声段模型的语音情感识别

由于语音和情绪的可变性，从语音中准确检测情感是一项具有挑战性的任务。本文提出了一种基于声段模型的语音情感识别方法来解决这一问题。具体来说，ASM对不同情绪的语音进行了更精细的分割。每个声音片段都由隐马尔可夫模型(hmm)建模，并以无监督的方式解码成一系列ASM序列。然后利用潜在语义分析(LSA)对这些序列进行特征向量提取。最后，将这些特征向量馈送到分类器。在IEMOCAP语料上进行了验证，结果表明，该方法的加权准确率为73.9%，非加权准确率为70.8%，优于现有的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)

自引率

0.00%

发文量