使用混合CNN和ConvLSTM的视频中的面部表情识别。

Rajesh Singh, Sumeet Saurav, Tarun Kumar, Ravi Saini, Anil Vohra, Sanjay Singh
{"title":"使用混合CNN和ConvLSTM的视频中的面部表情识别。","authors":"Rajesh Singh,&nbsp;Sumeet Saurav,&nbsp;Tarun Kumar,&nbsp;Ravi Saini,&nbsp;Anil Vohra,&nbsp;Sanjay Singh","doi":"10.1007/s41870-023-01183-0","DOIUrl":null,"url":null,"abstract":"<p><p>The three-dimensional convolutional neural network (3D-CNN) and long short-term memory (LSTM) have consistently outperformed many approaches in video-based facial expression recognition (VFER). The image is unrolled to a one-dimensional vector by the vanilla version of the fully-connected LSTM (FC-LSTM), which leads to the loss of crucial spatial information. Convolutional LSTM (ConvLSTM) overcomes this limitation by performing LSTM operations in convolutions without unrolling, thus retaining useful spatial information. Motivated by this, in this paper, we propose a neural network architecture that consists of a blend of 3D-CNN and ConvLSTM for VFER. The proposed hybrid architecture captures spatiotemporal information from the video sequences of emotions and attains competitive accuracy on three FER datasets open to the public, namely the SAVEE, CK + , and AFEW. The experimental results demonstrate excellent performance without external emotional data with the added advantage of having a simple model with fewer parameters. Moreover, unlike the state-of-the-art deep learning models, our designed FER pipeline improves execution speed by many factors while achieving competitive recognition accuracy. Hence, the proposed FER pipeline is an appropriate candidate for recognizing facial expressions on resource-limited embedded platforms for real-time applications.</p>","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":"15 4","pages":"1819-1830"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10028317/pdf/","citationCount":"5","resultStr":"{\"title\":\"Facial expression recognition in videos using hybrid CNN & ConvLSTM.\",\"authors\":\"Rajesh Singh,&nbsp;Sumeet Saurav,&nbsp;Tarun Kumar,&nbsp;Ravi Saini,&nbsp;Anil Vohra,&nbsp;Sanjay Singh\",\"doi\":\"10.1007/s41870-023-01183-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The three-dimensional convolutional neural network (3D-CNN) and long short-term memory (LSTM) have consistently outperformed many approaches in video-based facial expression recognition (VFER). The image is unrolled to a one-dimensional vector by the vanilla version of the fully-connected LSTM (FC-LSTM), which leads to the loss of crucial spatial information. Convolutional LSTM (ConvLSTM) overcomes this limitation by performing LSTM operations in convolutions without unrolling, thus retaining useful spatial information. Motivated by this, in this paper, we propose a neural network architecture that consists of a blend of 3D-CNN and ConvLSTM for VFER. The proposed hybrid architecture captures spatiotemporal information from the video sequences of emotions and attains competitive accuracy on three FER datasets open to the public, namely the SAVEE, CK + , and AFEW. The experimental results demonstrate excellent performance without external emotional data with the added advantage of having a simple model with fewer parameters. Moreover, unlike the state-of-the-art deep learning models, our designed FER pipeline improves execution speed by many factors while achieving competitive recognition accuracy. Hence, the proposed FER pipeline is an appropriate candidate for recognizing facial expressions on resource-limited embedded platforms for real-time applications.</p>\",\"PeriodicalId\":73455,\"journal\":{\"name\":\"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management\",\"volume\":\"15 4\",\"pages\":\"1819-1830\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10028317/pdf/\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41870-023-01183-0\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/3/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41870-023-01183-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/3/21 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

三维卷积神经网络(3D-CNN)和长短期记忆(LSTM)在基于视频的面部表情识别(VFER)中一直优于许多方法。完全连接LSTM(FC-LSTM)的香草版本将图像展开为一维向量,这导致关键空间信息的丢失。卷积LSTM(ConvLSTM)通过在卷积中执行LSTM操作而不展开来克服这一限制,从而保留有用的空间信息。受此启发,在本文中,我们提出了一种用于VFER的神经网络架构,该架构由3D-CNN和ConvLSTM的混合组成。所提出的混合架构从情绪的视频序列中捕获时空信息,并在向公众开放的三个FER数据集上达到竞争精度,即SAVE、CK + , 和AFEW。实验结果表明,在没有外部情绪数据的情况下,具有出色的性能,同时具有参数较少的简单模型的额外优势。此外,与最先进的深度学习模型不同,我们设计的FER流水线在实现有竞争力的识别精度的同时,通过许多因素提高了执行速度。因此,所提出的FER流水线是在资源有限的嵌入式平台上识别实时应用的面部表情的合适候选者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Facial expression recognition in videos using hybrid CNN & ConvLSTM.

Facial expression recognition in videos using hybrid CNN & ConvLSTM.

Facial expression recognition in videos using hybrid CNN & ConvLSTM.

Facial expression recognition in videos using hybrid CNN & ConvLSTM.

The three-dimensional convolutional neural network (3D-CNN) and long short-term memory (LSTM) have consistently outperformed many approaches in video-based facial expression recognition (VFER). The image is unrolled to a one-dimensional vector by the vanilla version of the fully-connected LSTM (FC-LSTM), which leads to the loss of crucial spatial information. Convolutional LSTM (ConvLSTM) overcomes this limitation by performing LSTM operations in convolutions without unrolling, thus retaining useful spatial information. Motivated by this, in this paper, we propose a neural network architecture that consists of a blend of 3D-CNN and ConvLSTM for VFER. The proposed hybrid architecture captures spatiotemporal information from the video sequences of emotions and attains competitive accuracy on three FER datasets open to the public, namely the SAVEE, CK + , and AFEW. The experimental results demonstrate excellent performance without external emotional data with the added advantage of having a simple model with fewer parameters. Moreover, unlike the state-of-the-art deep learning models, our designed FER pipeline improves execution speed by many factors while achieving competitive recognition accuracy. Hence, the proposed FER pipeline is an appropriate candidate for recognizing facial expressions on resource-limited embedded platforms for real-time applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信