基于时空递归神经网络和特征融合的视频表情识别方法

J. Inf. Process. Syst. Pub Date : 2021-04-01 DOI:10.3745/JIPS.01.0067

Xuan Zhou

{"title":"基于时空递归神经网络和特征融合的视频表情识别方法","authors":"Xuan Zhou","doi":"10.3745/JIPS.01.0067","DOIUrl":null,"url":null,"abstract":"Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.","PeriodicalId":415161,"journal":{"name":"J. Inf. Process. Syst.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion\",\"authors\":\"Xuan Zhou\",\"doi\":\"10.3745/JIPS.01.0067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.\",\"PeriodicalId\":415161,\"journal\":{\"name\":\"J. Inf. Process. Syst.\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Inf. Process. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3745/JIPS.01.0067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Process. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3745/JIPS.01.0067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

由于视频中的面部特征与主观情绪之间没有直接的相关性，因此自动识别视频序列中的面部表情是一项具有挑战性的任务。为了克服这一问题，提出了一种基于时空递归神经网络和特征融合的视频面部表情识别方法。首先，对视频进行预处理。然后，采用双层级联结构对视频图像中的人脸进行检测。此外，利用两个深度卷积神经网络提取视频中的时域和空域面部特征。利用空间卷积神经网络从视频中静态表情图像的每一帧中提取空间信息特征。利用时间卷积神经网络从视频中多帧表情图像的光流信息中提取动态信息特征。利用两个深度卷积神经网络学习到的时空特征进行乘法融合。最后，将融合后的特征输入到支持向量机中，实现面部表情分类任务。在cNTERFACE、RML和AFEW6.0数据集上的实验结果表明，该方法的识别率分别高达88.67%、70.32%和63.84%。对比实验表明，该方法比目前报道的其他方法具有更高的识别精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Inf. Process. Syst.

自引率

0.00%

发文量