Emotion Recognition in Public Speaking Scenarios Utilising An LSTM-RNN Approach with Attention

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI:10.1109/SLT48900.2021.9383542

Alice Baird, S. Amiriparian, M. Milling, Björn Schuller

{"title":"Emotion Recognition in Public Speaking Scenarios Utilising An LSTM-RNN Approach with Attention","authors":"Alice Baird, S. Amiriparian, M. Milling, Björn Schuller","doi":"10.1109/SLT48900.2021.9383542","DOIUrl":null,"url":null,"abstract":"Speaking in public can be a cause of fear for many people. Research suggests that there are physical markers such as an increased heart rate and vocal tremolo that indicate an individual’s state of wellbeing during a public speech. In this study, we explore the advantages of speech-based features for continuous recognition of the emotional dimensions of arousal and valence during a public speaking scenario. Furthermore, we explore biological signal fusion, and perform cross-language (German and English) analysis by training language-independent models and testing them on speech from various native and non-native speaker groupings. For the emotion recognition task itself, we utilise a Long Short-Term Memory - Recurrent Neural Network (LSTM-RNN) architecture with a self-attention layer. When utilising audio-only features and testing with non-native German’s speaking German we achieve at best a concordance correlation coefficient (CCC) of 0.640 and 0.491 for arousal and valence, respectively – demonstrating a strong effect for this task from non-native speakers, as well as promise for the suitability of deep learning for continuous emotion recognition in the context of public speaking.","PeriodicalId":243211,"journal":{"name":"2021 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT48900.2021.9383542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Speaking in public can be a cause of fear for many people. Research suggests that there are physical markers such as an increased heart rate and vocal tremolo that indicate an individual’s state of wellbeing during a public speech. In this study, we explore the advantages of speech-based features for continuous recognition of the emotional dimensions of arousal and valence during a public speaking scenario. Furthermore, we explore biological signal fusion, and perform cross-language (German and English) analysis by training language-independent models and testing them on speech from various native and non-native speaker groupings. For the emotion recognition task itself, we utilise a Long Short-Term Memory - Recurrent Neural Network (LSTM-RNN) architecture with a self-attention layer. When utilising audio-only features and testing with non-native German’s speaking German we achieve at best a concordance correlation coefficient (CCC) of 0.640 and 0.491 for arousal and valence, respectively – demonstrating a strong effect for this task from non-native speakers, as well as promise for the suitability of deep learning for continuous emotion recognition in the context of public speaking.

查看原文本刊更多论文

基于LSTM-RNN方法的演讲场景情感识别

对许多人来说，在公共场合讲话可能会引起恐惧。研究表明，在公开演讲中，有一些身体标志，如心率加快和声音颤抖，可以表明一个人的健康状态。在本研究中，我们探讨了基于语音的特征在公共演讲场景中连续识别唤醒和效价的情感维度的优势。此外，我们探索了生物信号融合，并通过训练语言独立模型并对来自不同母语和非母语人群的语音进行测试来进行跨语言(德语和英语)分析。对于情绪识别任务本身，我们使用具有自注意层的长短期记忆-循环神经网络(LSTM-RNN)架构。当使用纯音频功能并对非母语德国人说德语进行测试时，我们在唤醒和效价方面分别获得了0.640和0.491的一致性相关系数(CCC)，这表明非母语人士对这项任务有很强的影响，同时也表明深度学习适合于公开演讲环境下的持续情绪识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量