具有自动峰值帧选择的多模态情感识别

2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings Pub Date : 2014-06-23 DOI:10.1109/INISTA.2014.6873606

Sara Zhalehpour, Z. Akhtar, Ç. Erdem

{"title":"具有自动峰值帧选择的多模态情感识别","authors":"Sara Zhalehpour, Z. Akhtar, Ç. Erdem","doi":"10.1109/INISTA.2014.6873606","DOIUrl":null,"url":null,"abstract":"In this paper we present an effective framework for multimodal emotion recognition based on a novel approach for automatic peak frame selection from audio-visual video sequences. Given a video with an emotional expression, peak frames are the ones at which the emotion is at its apex. The objective of peak frame selection is to make the training process for the automatic emotion recognition system easier by summarizing the expressed emotion over a video sequence. The main steps of the proposed framework consists of extraction of video and audio features based on peak frame selection, unimodal classification and decision level fusion of audio and visual results. We evaluated the performance of our approach on eNTERFACE'05 audio-visual database containing six basic emotional classes. Experimental results demonstrate the effectiveness and superiority of the proposed system over other methods in the literature.","PeriodicalId":339652,"journal":{"name":"2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings","volume":"318 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Multimodal emotion recognition with automatic peak frame selection\",\"authors\":\"Sara Zhalehpour, Z. Akhtar, Ç. Erdem\",\"doi\":\"10.1109/INISTA.2014.6873606\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present an effective framework for multimodal emotion recognition based on a novel approach for automatic peak frame selection from audio-visual video sequences. Given a video with an emotional expression, peak frames are the ones at which the emotion is at its apex. The objective of peak frame selection is to make the training process for the automatic emotion recognition system easier by summarizing the expressed emotion over a video sequence. The main steps of the proposed framework consists of extraction of video and audio features based on peak frame selection, unimodal classification and decision level fusion of audio and visual results. We evaluated the performance of our approach on eNTERFACE'05 audio-visual database containing six basic emotional classes. Experimental results demonstrate the effectiveness and superiority of the proposed system over other methods in the literature.\",\"PeriodicalId\":339652,\"journal\":{\"name\":\"2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings\",\"volume\":\"318 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INISTA.2014.6873606\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INISTA.2014.6873606","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

本文提出了一种有效的多模态情感识别框架，该框架基于一种从视听视频序列中自动选取峰值帧的新方法。给定一个带有情感表达的视频，峰值帧是情感达到顶点的帧。峰值帧选择的目的是通过对视频序列中表达的情感进行汇总，使自动情感识别系统的训练过程更加简单。该框架的主要步骤包括基于峰值帧选择的视频和音频特征提取、单峰分类和视听结果的决策级融合。我们在eNTERFACE'05包含六个基本情感类别的视听数据库上评估了我们的方法的性能。实验结果证明了该系统与文献中其他方法相比的有效性和优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multimodal emotion recognition with automatic peak frame selection

In this paper we present an effective framework for multimodal emotion recognition based on a novel approach for automatic peak frame selection from audio-visual video sequences. Given a video with an emotional expression, peak frames are the ones at which the emotion is at its apex. The objective of peak frame selection is to make the training process for the automatic emotion recognition system easier by summarizing the expressed emotion over a video sequence. The main steps of the proposed framework consists of extraction of video and audio features based on peak frame selection, unimodal classification and decision level fusion of audio and visual results. We evaluated the performance of our approach on eNTERFACE'05 audio-visual database containing six basic emotional classes. Experimental results demonstrate the effectiveness and superiority of the proposed system over other methods in the literature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings

自引率

0.00%

发文量