Multimodal emotion recognition in audiovisual communication

Proceedings. IEEE International Conference on Multimedia and Expo Pub Date : 2002-11-07 DOI:10.1109/ICME.2002.1035889

Björn Schuller, M. Lang, G. Rigoll

{"title":"Multimodal emotion recognition in audiovisual communication","authors":"Björn Schuller, M. Lang, G. Rigoll","doi":"10.1109/ICME.2002.1035889","DOIUrl":null,"url":null,"abstract":"This paper discusses innovative techniques to automatically estimate a user's emotional state analyzing the speech signal and haptical interaction on a touch-screen or via mouse. The knowledge of a user's emotion permits adaptive strategies striving for a more natural and robust interaction. We classify seven emotional states: surprise, joy, anger, fear, disgust, sadness, and neutral user state. The user's emotion is extracted by a parallel stochastic analysis of his spoken and haptical machine interactions while understanding the desired intention. The introduced methods are based on the common prosodic speech features pitch and energy, but rely also on the semantic and intention based features wording, degree of verbosity, temporal intention and word rate, and finally the history of user utterances. As further modality even touch-screen or mouse interaction is analyzed. The estimates based on these features are integrated in a multimodal way. The introduced methods are based on results of user studies. A realization proved to be reliable compared with subjective probands' impressions.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"6 1","pages":"745-748 vol.1"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Multimedia and Expo","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2002.1035889","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 55

Abstract

This paper discusses innovative techniques to automatically estimate a user's emotional state analyzing the speech signal and haptical interaction on a touch-screen or via mouse. The knowledge of a user's emotion permits adaptive strategies striving for a more natural and robust interaction. We classify seven emotional states: surprise, joy, anger, fear, disgust, sadness, and neutral user state. The user's emotion is extracted by a parallel stochastic analysis of his spoken and haptical machine interactions while understanding the desired intention. The introduced methods are based on the common prosodic speech features pitch and energy, but rely also on the semantic and intention based features wording, degree of verbosity, temporal intention and word rate, and finally the history of user utterances. As further modality even touch-screen or mouse interaction is analyzed. The estimates based on these features are integrated in a multimodal way. The introduced methods are based on results of user studies. A realization proved to be reliable compared with subjective probands' impressions.

查看原文本刊更多论文

视听交流中的多模态情感识别

本文讨论了在触摸屏或鼠标上通过分析语音信号和触觉交互来自动估计用户情绪状态的创新技术。对用户情感的了解允许适应性策略努力实现更自然、更健壮的交互。我们将七种情绪状态分为:惊奇、喜悦、愤怒、恐惧、厌恶、悲伤和中性用户状态。在理解预期意图的同时，通过对他的口头和触觉机器交互的并行随机分析，提取用户的情感。所介绍的方法是基于常见的韵律语音特征音高和能量，但也依赖于基于语义和意图的特征措辞，冗长程度，时间意图和词率，最后是用户话语的历史。作为进一步的模式，甚至触摸屏或鼠标交互分析。基于这些特征的估计以多模式方式集成。所介绍的方法是基于用户研究的结果。事实证明，与主观的先证者印象相比，实现是可靠的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE International Conference on Multimedia and Expo

自引率

0.00%

发文量