Classification of Social Signals Using Deep LSTM-based Recurrent Neural Networks

2020 International Conference on Signal Processing and Communications (SPCOM) Pub Date : 2020-07-01 DOI:10.1109/SPCOM50965.2020.9179516

Himanshu Joshi, Ananya Verma, Amrita Mishra

引用次数: 4

Abstract

Non-linguistic speech cues aid expression of various emotions in human communication. In this work, we demonstrate the application of deep long short-term memory (LSTM) recurrent neural networks for frame-wise detection and classification of laughter and filler vocalizations in speech data. Further, we propose a novel approach to perform classification by incorporating cluster information as an additional feature wherein the clusters in the dataset are extracted via a k-means clustering algorithm. Extensive simulation results demonstrate that the proposed approach achieves significant improvement over the conventional LSTM-based classification methods. Also, the performance of deep LSTM models obtained by stacking LSTMs, is studied. Lastly, for classification of the temporally correlated speech data considered in this work, a comparison with popular machine learning-based techniques validates the superiority of the proposed LSTM-based scheme.

查看原文本刊更多论文

基于深度lstm的递归神经网络的社会信号分类

非语言言语线索有助于人类交流中各种情绪的表达。在这项工作中，我们展示了深度长短期记忆(LSTM)递归神经网络在语音数据中笑声和填充发声的逐帧检测和分类中的应用。此外，我们提出了一种新的方法，通过将聚类信息作为附加特征来执行分类，其中数据集中的聚类通过k-means聚类算法提取。大量的仿真结果表明，该方法比传统的基于lstm的分类方法有了显著的改进。此外，还研究了由LSTM叠加得到的深度LSTM模型的性能。最后，对于本研究中考虑的时间相关语音数据的分类，与流行的基于机器学习的技术进行比较，验证了所提出的基于lstm的方案的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Signal Processing and Communications (SPCOM)

自引率

0.00%

发文量