Robust emotion recognition in thermal imaging with convolutional neural networks and grey wolf optimization

IF 2.7 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication Pub Date : 2025-06-06 DOI:10.1016/j.image.2025.117363

Anselme Atchogou , Cengiz Tepe

{"title":"Robust emotion recognition in thermal imaging with convolutional neural networks and grey wolf optimization","authors":"Anselme Atchogou , Cengiz Tepe","doi":"10.1016/j.image.2025.117363","DOIUrl":null,"url":null,"abstract":"<div><div>Facial Expression Recognition (FER) is a pivotal technology in human-computer interaction, with applications spanning psychology, virtual reality, and advanced driver assistance systems. Traditional FER using visible light cameras faces challenges in low light conditions, shadows, and reflections. This study explores thermal imaging as an alternative, leveraging its ability to capture heat radiation and overcome lighting issues. We propose a novel approach that combines pre-trained models, particularly EfficientNet variants, with Grey Wolf Optimization (GWO) and various classifiers for robust emotion recognition. Ten pre-trained CNN models, including variants of EfficientNet (EfficientNet-B0, B3, B4, B7, V2L, V2M, V2S), ResNet50, MobileNet, and InceptionResNetV2, are utilized to extract features from thermal images. GWO is employed to optimize the parameters of four classifiers: Support Vector Machine (SVM), Random Forest, Gradient Boosting, and k-Nearest Neighbors (kNN). Two popular thermal image datasets, IRDatabase and KTFE, are used to assess the suggested methodology. Combining EfficientNet-B7 with GWO and kNN or SVM for eight distinct emotions (fear, anger, contempt, disgust, happiness, neutrality, sadness, and surprise) yielded the highest accuracy of 91.42 % on the IRDatabase dataset. Combining EfficientNet-B7 with GWO and Gradient Boosting for seven distinct emotions (anger, disgust, fear, happiness, neutrality, sadness, and surprise) yielded the highest accuracy of 99.48 % on the KTFE dataset. These results demonstrate the effectiveness and reliability of the proposed approach for emotion identification in thermal images, making it a viable way to overcome the drawbacks of conventional visible-light-based FER systems.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117363"},"PeriodicalIF":2.7000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0923596525001092","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Facial Expression Recognition (FER) is a pivotal technology in human-computer interaction, with applications spanning psychology, virtual reality, and advanced driver assistance systems. Traditional FER using visible light cameras faces challenges in low light conditions, shadows, and reflections. This study explores thermal imaging as an alternative, leveraging its ability to capture heat radiation and overcome lighting issues. We propose a novel approach that combines pre-trained models, particularly EfficientNet variants, with Grey Wolf Optimization (GWO) and various classifiers for robust emotion recognition. Ten pre-trained CNN models, including variants of EfficientNet (EfficientNet-B0, B3, B4, B7, V2L, V2M, V2S), ResNet50, MobileNet, and InceptionResNetV2, are utilized to extract features from thermal images. GWO is employed to optimize the parameters of four classifiers: Support Vector Machine (SVM), Random Forest, Gradient Boosting, and k-Nearest Neighbors (kNN). Two popular thermal image datasets, IRDatabase and KTFE, are used to assess the suggested methodology. Combining EfficientNet-B7 with GWO and kNN or SVM for eight distinct emotions (fear, anger, contempt, disgust, happiness, neutrality, sadness, and surprise) yielded the highest accuracy of 91.42 % on the IRDatabase dataset. Combining EfficientNet-B7 with GWO and Gradient Boosting for seven distinct emotions (anger, disgust, fear, happiness, neutrality, sadness, and surprise) yielded the highest accuracy of 99.48 % on the KTFE dataset. These results demonstrate the effectiveness and reliability of the proposed approach for emotion identification in thermal images, making it a viable way to overcome the drawbacks of conventional visible-light-based FER systems.

Abstract Image

查看原文本刊更多论文

基于卷积神经网络和灰狼优化的热成像鲁棒情绪识别

面部表情识别（FER）是人机交互中的一项关键技术，其应用涵盖心理学、虚拟现实和高级驾驶辅助系统。使用可见光相机的传统FER在弱光条件下面临着阴影和反射的挑战。本研究探索热成像作为一种替代方案，利用其捕获热辐射和克服照明问题的能力。我们提出了一种新的方法，将预训练模型，特别是高效网络变体，与灰狼优化（GWO）和各种分类器相结合，用于鲁棒情绪识别。利用10个预训练的CNN模型，包括EfficientNet的变体（EfficientNet- b0、B3、B4、B7、V2L、V2M、V2S）、ResNet50、MobileNet和InceptionResNetV2，从热图像中提取特征。采用GWO对支持向量机（SVM）、随机森林（Random Forest）、梯度增强（Gradient Boosting）和k近邻（kNN）四种分类器的参数进行优化。两个流行的热图像数据集，IRDatabase和KTFE，被用来评估建议的方法。将EfficientNet-B7与GWO、kNN或SVM结合起来，对8种不同的情绪（恐惧、愤怒、蔑视、厌恶、快乐、中立、悲伤和惊讶）进行分析，在IRDatabase数据集中获得了91.42%的最高准确率。将EfficientNet-B7与GWO和梯度增强相结合，对七种不同的情绪（愤怒、厌恶、恐惧、快乐、中立、悲伤和惊讶）进行处理，在KTFE数据集中获得了99.48%的最高准确率。这些结果证明了所提出的方法在热图像中情感识别的有效性和可靠性，使其成为克服传统可见光基FER系统缺点的可行方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Signal Processing-Image Communication 工程技术-工程：电子与电气

CiteScore

8.40

自引率

2.90%

发文量

138

审稿时长

5.2 months

期刊介绍： Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following: To present a forum for the advancement of theory and practice of image communication. To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems. To contribute to a rapid information exchange between the industrial and academic environments. The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world. Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments. Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.