视频中多维情感识别的多尺度时间建模

AVEC '14 Pub Date : 2014-11-07 DOI:10.1145/2661806.2661811
Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen
{"title":"视频中多维情感识别的多尺度时间建模","authors":"Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen","doi":"10.1145/2661806.2661811","DOIUrl":null,"url":null,"abstract":"Understanding nonverbal behaviors in human machine interaction is a complex and challenge task. One of the key aspects is to recognize human emotion states accurately. This paper presents our effort to the Audio/Visual Emotion Challenge (AVEC'14), whose goal is to predict the continuous values of the emotion dimensions arousal, valence and dominance at each moment in time. The proposed method utilizes deep belief network based models to recognize emotion states from audio and visual modalities. Firstly, we employ temporal pooling functions in the deep neutral network to encode dynamic information in the features, which achieves the first time scale temporal modeling. Secondly, we combine the predicted results from different modalities and emotion temporal context information simultaneously. The proposed multimodal-temporal fusion achieves temporal modeling for the emotion states in the second time scale. Experiments results show the efficiency of each key point of the proposed method and competitive results are obtained","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":"{\"title\":\"Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video\",\"authors\":\"Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen\",\"doi\":\"10.1145/2661806.2661811\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding nonverbal behaviors in human machine interaction is a complex and challenge task. One of the key aspects is to recognize human emotion states accurately. This paper presents our effort to the Audio/Visual Emotion Challenge (AVEC'14), whose goal is to predict the continuous values of the emotion dimensions arousal, valence and dominance at each moment in time. The proposed method utilizes deep belief network based models to recognize emotion states from audio and visual modalities. Firstly, we employ temporal pooling functions in the deep neutral network to encode dynamic information in the features, which achieves the first time scale temporal modeling. Secondly, we combine the predicted results from different modalities and emotion temporal context information simultaneously. The proposed multimodal-temporal fusion achieves temporal modeling for the emotion states in the second time scale. Experiments results show the efficiency of each key point of the proposed method and competitive results are obtained\",\"PeriodicalId\":318508,\"journal\":{\"name\":\"AVEC '14\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"41\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AVEC '14\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2661806.2661811\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AVEC '14","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2661806.2661811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41

摘要

理解人机交互中的非语言行为是一项复杂而具有挑战性的任务。其中一个关键方面是准确识别人类的情绪状态。本文介绍了我们对视听情绪挑战(AVEC'14)的努力,其目标是预测情绪维度在每个时刻的唤醒,效价和优势的连续值。该方法利用基于深度信念网络的模型从音频和视觉模态中识别情绪状态。首先,我们利用深度神经网络中的时间池函数对特征中的动态信息进行编码,实现了第一次时间尺度的时间建模;其次,我们将不同模态的预测结果与情绪时间语境信息同时结合起来。所提出的多模态-时间融合实现了第二时间尺度下情绪状态的时间建模。实验结果表明,所提方法在各个关键点上都是有效的,取得了较好的效果
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video
Understanding nonverbal behaviors in human machine interaction is a complex and challenge task. One of the key aspects is to recognize human emotion states accurately. This paper presents our effort to the Audio/Visual Emotion Challenge (AVEC'14), whose goal is to predict the continuous values of the emotion dimensions arousal, valence and dominance at each moment in time. The proposed method utilizes deep belief network based models to recognize emotion states from audio and visual modalities. Firstly, we employ temporal pooling functions in the deep neutral network to encode dynamic information in the features, which achieves the first time scale temporal modeling. Secondly, we combine the predicted results from different modalities and emotion temporal context information simultaneously. The proposed multimodal-temporal fusion achieves temporal modeling for the emotion states in the second time scale. Experiments results show the efficiency of each key point of the proposed method and competitive results are obtained
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信