集成端到端多模态深度学习和领域自适应的鲁棒面部表情识别

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Mahmoud Hassaballah , Chiara Pero , Ranjeet Kumar Rout , Saiyed Umer
{"title":"集成端到端多模态深度学习和领域自适应的鲁棒面部表情识别","authors":"Mahmoud Hassaballah ,&nbsp;Chiara Pero ,&nbsp;Ranjeet Kumar Rout ,&nbsp;Saiyed Umer","doi":"10.1016/j.imavis.2025.105548","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents an advanced approach to a facial expression recognition (FER) system designed for robust performance across diverse imaging environments. The proposed method consists of four primary components: image preprocessing, feature representation and classification, cross-domain feature analysis, and domain adaptation. The process begins with facial region extraction from input images, including those captured in unconstrained imaging conditions, where variations in lighting, background, and image quality significantly impact recognition performance. The extracted facial region undergoes feature extraction using an ensemble of multimodal deep learning techniques, including end-to-end CNNs, BilinearCNN, TrilinearCNN, and pretrained CNN models, which capture both local and global facial features with high precision. The ensemble approach enriches feature representation by integrating information from multiple models, enhancing the system’s ability to generalize across different subjects and expressions. These deep features are then passed to a classifier trained to recognize facial expressions effectively in real-time scenarios. Since images captured in real-world conditions often contain noise and artifacts that can compromise accuracy, cross-domain analysis is performed to evaluate the discriminative power and robustness of the extracted deep features. FER systems typically experience performance degradation when applied to domains that differ from the original training environment. To mitigate this issue, domain adaptation techniques are incorporated, enabling the system to effectively adjust to new imaging conditions and improving recognition accuracy even in challenging real-time acquisition environments. The proposed FER system is validated using four well-established benchmark datasets: CK+, KDEF, IMFDB and AffectNet. Experimental results demonstrate that the proposed system achieves high performance within original domains and exhibits superior cross-domain recognition compared to existing state-of-the-art methods. These findings indicate that the system is highly reliable for applications requiring robust and adaptive FER capabilities across varying imaging conditions and domains.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105548"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrating end-to-end multimodal deep learning and domain adaptation for robust facial expression recognition\",\"authors\":\"Mahmoud Hassaballah ,&nbsp;Chiara Pero ,&nbsp;Ranjeet Kumar Rout ,&nbsp;Saiyed Umer\",\"doi\":\"10.1016/j.imavis.2025.105548\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents an advanced approach to a facial expression recognition (FER) system designed for robust performance across diverse imaging environments. The proposed method consists of four primary components: image preprocessing, feature representation and classification, cross-domain feature analysis, and domain adaptation. The process begins with facial region extraction from input images, including those captured in unconstrained imaging conditions, where variations in lighting, background, and image quality significantly impact recognition performance. The extracted facial region undergoes feature extraction using an ensemble of multimodal deep learning techniques, including end-to-end CNNs, BilinearCNN, TrilinearCNN, and pretrained CNN models, which capture both local and global facial features with high precision. The ensemble approach enriches feature representation by integrating information from multiple models, enhancing the system’s ability to generalize across different subjects and expressions. These deep features are then passed to a classifier trained to recognize facial expressions effectively in real-time scenarios. Since images captured in real-world conditions often contain noise and artifacts that can compromise accuracy, cross-domain analysis is performed to evaluate the discriminative power and robustness of the extracted deep features. FER systems typically experience performance degradation when applied to domains that differ from the original training environment. To mitigate this issue, domain adaptation techniques are incorporated, enabling the system to effectively adjust to new imaging conditions and improving recognition accuracy even in challenging real-time acquisition environments. The proposed FER system is validated using four well-established benchmark datasets: CK+, KDEF, IMFDB and AffectNet. Experimental results demonstrate that the proposed system achieves high performance within original domains and exhibits superior cross-domain recognition compared to existing state-of-the-art methods. These findings indicate that the system is highly reliable for applications requiring robust and adaptive FER capabilities across varying imaging conditions and domains.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"159 \",\"pages\":\"Article 105548\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625001362\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001362","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种先进的面部表情识别(FER)系统的方法,该系统设计在不同的成像环境中具有鲁棒性。该方法主要包括图像预处理、特征表示与分类、跨域特征分析和域自适应四个部分。该过程首先从输入图像中提取面部区域,包括在无约束成像条件下捕获的图像,其中光照、背景和图像质量的变化会显著影响识别性能。提取的面部区域使用多模态深度学习技术进行特征提取,包括端到端CNN、bilineearcnn、trilineearcnn和预训练CNN模型,以高精度捕获局部和全局面部特征。集成方法通过集成来自多个模型的信息来丰富特征表示,增强了系统跨不同主题和表达的泛化能力。然后将这些深度特征传递给经过训练的分类器,以便在实时场景中有效识别面部表情。由于在真实世界条件下捕获的图像通常包含可能影响准确性的噪声和人工制品,因此进行跨域分析以评估提取的深度特征的判别能力和鲁棒性。当应用于与原始训练环境不同的领域时,FER系统通常会出现性能下降。为了缓解这一问题,我们采用了域适应技术,使系统能够有效地适应新的成像条件,并在具有挑战性的实时采集环境中提高识别精度。提出的FER系统使用四个成熟的基准数据集进行验证:CK+, KDEF, IMFDB和AffectNet。实验结果表明,与现有的先进方法相比,该系统在原始域内取得了较高的性能,并表现出更好的跨域识别能力。这些发现表明,该系统对于需要在不同成像条件和领域中具有鲁棒性和自适应能力的应用来说是高度可靠的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Integrating end-to-end multimodal deep learning and domain adaptation for robust facial expression recognition

Integrating end-to-end multimodal deep learning and domain adaptation for robust facial expression recognition
This paper presents an advanced approach to a facial expression recognition (FER) system designed for robust performance across diverse imaging environments. The proposed method consists of four primary components: image preprocessing, feature representation and classification, cross-domain feature analysis, and domain adaptation. The process begins with facial region extraction from input images, including those captured in unconstrained imaging conditions, where variations in lighting, background, and image quality significantly impact recognition performance. The extracted facial region undergoes feature extraction using an ensemble of multimodal deep learning techniques, including end-to-end CNNs, BilinearCNN, TrilinearCNN, and pretrained CNN models, which capture both local and global facial features with high precision. The ensemble approach enriches feature representation by integrating information from multiple models, enhancing the system’s ability to generalize across different subjects and expressions. These deep features are then passed to a classifier trained to recognize facial expressions effectively in real-time scenarios. Since images captured in real-world conditions often contain noise and artifacts that can compromise accuracy, cross-domain analysis is performed to evaluate the discriminative power and robustness of the extracted deep features. FER systems typically experience performance degradation when applied to domains that differ from the original training environment. To mitigate this issue, domain adaptation techniques are incorporated, enabling the system to effectively adjust to new imaging conditions and improving recognition accuracy even in challenging real-time acquisition environments. The proposed FER system is validated using four well-established benchmark datasets: CK+, KDEF, IMFDB and AffectNet. Experimental results demonstrate that the proposed system achieves high performance within original domains and exhibits superior cross-domain recognition compared to existing state-of-the-art methods. These findings indicate that the system is highly reliable for applications requiring robust and adaptive FER capabilities across varying imaging conditions and domains.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信