基于静态表情洞察的增强动态时间特征提取用于动态面部表情识别

IF 2.9 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Digital Signal Processing Pub Date : 2025-07-11 DOI:10.1016/j.dsp.2025.105470

Tingting Han, Shuwei Dou, Wenxia Zhang, Ruqian Liu

{"title":"基于静态表情洞察的增强动态时间特征提取用于动态面部表情识别","authors":"Tingting Han, Shuwei Dou, Wenxia Zhang, Ruqian Liu","doi":"10.1016/j.dsp.2025.105470","DOIUrl":null,"url":null,"abstract":"<div><div>Dynamic Facial Expression Recognition (DFER) is a critical task in the field of computer vision, which involves the recognition and analysis of changes in facial expressions from video sequences. The extraction of temporal features of facial emotions in videos is one of the main challenges facing DFER. This paper proposes a model named RTT, based on IR50-Transformer-TFEM to enhanced dynamic temporal feature extraction with static expression insights for DFER. Specifically, the IR50 in RTT focuses on extracting static facial features from each frame of the video, while the Transformer works in conjunction with our Time Feature Enhancement Module (TFEM) to extract temporal features from the video sequence. TFEM is built after the Transformer, aiming to explore deeper temporal information. TFEM consists mainly of two important components: Feature Mapping Network (FMN) and Temporal Dependency Network (TDN). FMN enhances temporal information through feature interaction and feature weighting, while TDN encodes temporal dependencies in sequences to improve sensitivity to complex dynamic expressions. Finally, a feature representation with both facial emotional and temporal features is formed for DFER. We present promising results that surpass current state-of-the-art (SOTA) techniques on two widely recognized DFER benchmark datasets, DFEW and FERV39K. In the DFEW data set, it achieves 71.24% for unweighted average recall (UAR) and 86.81% for weighted average recall (WAR). In the FERV39K dataset, it reaches 48.59% for UAR and 60.42% for WAR. These experimental results indicate that our approach outperforms existing SOTA methods in the DFER task, suggesting the potential effectiveness of the RTT model.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105470"},"PeriodicalIF":2.9000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced dynamic temporal feature extraction with static expression insights for dynamic facial expression recognition\",\"authors\":\"Tingting Han, Shuwei Dou, Wenxia Zhang, Ruqian Liu\",\"doi\":\"10.1016/j.dsp.2025.105470\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Dynamic Facial Expression Recognition (DFER) is a critical task in the field of computer vision, which involves the recognition and analysis of changes in facial expressions from video sequences. The extraction of temporal features of facial emotions in videos is one of the main challenges facing DFER. This paper proposes a model named RTT, based on IR50-Transformer-TFEM to enhanced dynamic temporal feature extraction with static expression insights for DFER. Specifically, the IR50 in RTT focuses on extracting static facial features from each frame of the video, while the Transformer works in conjunction with our Time Feature Enhancement Module (TFEM) to extract temporal features from the video sequence. TFEM is built after the Transformer, aiming to explore deeper temporal information. TFEM consists mainly of two important components: Feature Mapping Network (FMN) and Temporal Dependency Network (TDN). FMN enhances temporal information through feature interaction and feature weighting, while TDN encodes temporal dependencies in sequences to improve sensitivity to complex dynamic expressions. Finally, a feature representation with both facial emotional and temporal features is formed for DFER. We present promising results that surpass current state-of-the-art (SOTA) techniques on two widely recognized DFER benchmark datasets, DFEW and FERV39K. In the DFEW data set, it achieves 71.24% for unweighted average recall (UAR) and 86.81% for weighted average recall (WAR). In the FERV39K dataset, it reaches 48.59% for UAR and 60.42% for WAR. These experimental results indicate that our approach outperforms existing SOTA methods in the DFER task, suggesting the potential effectiveness of the RTT model.</div></div>\",\"PeriodicalId\":51011,\"journal\":{\"name\":\"Digital Signal Processing\",\"volume\":\"168 \",\"pages\":\"Article 105470\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1051200425004920\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425004920","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

动态面部表情识别（DFER）是计算机视觉领域的一项关键任务，它涉及对视频序列中面部表情变化的识别和分析。视频中面部情绪的时间特征提取是DFER面临的主要挑战之一。本文提出了一种基于IR50-Transformer-TFEM的RTT模型，以增强DFER的动态时间特征提取和静态表达洞察力。具体来说，RTT中的IR50专注于从视频的每一帧中提取静态面部特征，而Transformer则与我们的时间特征增强模块（TFEM）一起工作，从视频序列中提取时间特征。TFEM是在变形金刚之后建立的，旨在探索更深层次的时间信息。TFEM主要由两个重要部分组成：特征映射网络（Feature Mapping Network， FMN）和时间依赖网络（Temporal Dependency Network， TDN）。FMN通过特征交互和特征加权增强时间信息，而TDN编码时序依赖，提高对复杂动态表达式的敏感性。最后，形成了一个包含面部情感特征和时间特征的特征表示。我们在两个广泛认可的不同基准数据集DFEW和FERV39K上展示了超越当前最先进（SOTA）技术的有希望的结果。在DFEW数据集中，未加权平均召回率（UAR）达到71.24%，加权平均召回率（WAR）达到86.81%。在FERV39K数据集中，UAR达到48.59%，WAR达到60.42%。这些实验结果表明，我们的方法在DFER任务中优于现有的SOTA方法，表明RTT模型的潜在有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhanced dynamic temporal feature extraction with static expression insights for dynamic facial expression recognition

Dynamic Facial Expression Recognition (DFER) is a critical task in the field of computer vision, which involves the recognition and analysis of changes in facial expressions from video sequences. The extraction of temporal features of facial emotions in videos is one of the main challenges facing DFER. This paper proposes a model named RTT, based on IR50-Transformer-TFEM to enhanced dynamic temporal feature extraction with static expression insights for DFER. Specifically, the IR50 in RTT focuses on extracting static facial features from each frame of the video, while the Transformer works in conjunction with our Time Feature Enhancement Module (TFEM) to extract temporal features from the video sequence. TFEM is built after the Transformer, aiming to explore deeper temporal information. TFEM consists mainly of two important components: Feature Mapping Network (FMN) and Temporal Dependency Network (TDN). FMN enhances temporal information through feature interaction and feature weighting, while TDN encodes temporal dependencies in sequences to improve sensitivity to complex dynamic expressions. Finally, a feature representation with both facial emotional and temporal features is formed for DFER. We present promising results that surpass current state-of-the-art (SOTA) techniques on two widely recognized DFER benchmark datasets, DFEW and FERV39K. In the DFEW data set, it achieves 71.24% for unweighted average recall (UAR) and 86.81% for weighted average recall (WAR). In the FERV39K dataset, it reaches 48.59% for UAR and 60.42% for WAR. These experimental results indicate that our approach outperforms existing SOTA methods in the DFER task, suggesting the potential effectiveness of the RTT model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Digital Signal Processing 工程技术-工程：电子与电气

CiteScore

5.30

自引率

17.20%

发文量

435

审稿时长

66 days

期刊介绍： Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,