Tingting Han, Shuwei Dou, Wenxia Zhang, Ruqian Liu
{"title":"基于静态表情洞察的增强动态时间特征提取用于动态面部表情识别","authors":"Tingting Han, Shuwei Dou, Wenxia Zhang, Ruqian Liu","doi":"10.1016/j.dsp.2025.105470","DOIUrl":null,"url":null,"abstract":"<div><div>Dynamic Facial Expression Recognition (DFER) is a critical task in the field of computer vision, which involves the recognition and analysis of changes in facial expressions from video sequences. The extraction of temporal features of facial emotions in videos is one of the main challenges facing DFER. This paper proposes a model named RTT, based on IR50-Transformer-TFEM to enhanced dynamic temporal feature extraction with static expression insights for DFER. Specifically, the IR50 in RTT focuses on extracting static facial features from each frame of the video, while the Transformer works in conjunction with our Time Feature Enhancement Module (TFEM) to extract temporal features from the video sequence. TFEM is built after the Transformer, aiming to explore deeper temporal information. TFEM consists mainly of two important components: Feature Mapping Network (FMN) and Temporal Dependency Network (TDN). FMN enhances temporal information through feature interaction and feature weighting, while TDN encodes temporal dependencies in sequences to improve sensitivity to complex dynamic expressions. Finally, a feature representation with both facial emotional and temporal features is formed for DFER. We present promising results that surpass current state-of-the-art (SOTA) techniques on two widely recognized DFER benchmark datasets, DFEW and FERV39K. In the DFEW data set, it achieves 71.24% for unweighted average recall (UAR) and 86.81% for weighted average recall (WAR). In the FERV39K dataset, it reaches 48.59% for UAR and 60.42% for WAR. These experimental results indicate that our approach outperforms existing SOTA methods in the DFER task, suggesting the potential effectiveness of the RTT model.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105470"},"PeriodicalIF":2.9000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced dynamic temporal feature extraction with static expression insights for dynamic facial expression recognition\",\"authors\":\"Tingting Han, Shuwei Dou, Wenxia Zhang, Ruqian Liu\",\"doi\":\"10.1016/j.dsp.2025.105470\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Dynamic Facial Expression Recognition (DFER) is a critical task in the field of computer vision, which involves the recognition and analysis of changes in facial expressions from video sequences. The extraction of temporal features of facial emotions in videos is one of the main challenges facing DFER. This paper proposes a model named RTT, based on IR50-Transformer-TFEM to enhanced dynamic temporal feature extraction with static expression insights for DFER. Specifically, the IR50 in RTT focuses on extracting static facial features from each frame of the video, while the Transformer works in conjunction with our Time Feature Enhancement Module (TFEM) to extract temporal features from the video sequence. TFEM is built after the Transformer, aiming to explore deeper temporal information. TFEM consists mainly of two important components: Feature Mapping Network (FMN) and Temporal Dependency Network (TDN). FMN enhances temporal information through feature interaction and feature weighting, while TDN encodes temporal dependencies in sequences to improve sensitivity to complex dynamic expressions. Finally, a feature representation with both facial emotional and temporal features is formed for DFER. We present promising results that surpass current state-of-the-art (SOTA) techniques on two widely recognized DFER benchmark datasets, DFEW and FERV39K. In the DFEW data set, it achieves 71.24% for unweighted average recall (UAR) and 86.81% for weighted average recall (WAR). In the FERV39K dataset, it reaches 48.59% for UAR and 60.42% for WAR. These experimental results indicate that our approach outperforms existing SOTA methods in the DFER task, suggesting the potential effectiveness of the RTT model.</div></div>\",\"PeriodicalId\":51011,\"journal\":{\"name\":\"Digital Signal Processing\",\"volume\":\"168 \",\"pages\":\"Article 105470\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1051200425004920\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425004920","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Enhanced dynamic temporal feature extraction with static expression insights for dynamic facial expression recognition
Dynamic Facial Expression Recognition (DFER) is a critical task in the field of computer vision, which involves the recognition and analysis of changes in facial expressions from video sequences. The extraction of temporal features of facial emotions in videos is one of the main challenges facing DFER. This paper proposes a model named RTT, based on IR50-Transformer-TFEM to enhanced dynamic temporal feature extraction with static expression insights for DFER. Specifically, the IR50 in RTT focuses on extracting static facial features from each frame of the video, while the Transformer works in conjunction with our Time Feature Enhancement Module (TFEM) to extract temporal features from the video sequence. TFEM is built after the Transformer, aiming to explore deeper temporal information. TFEM consists mainly of two important components: Feature Mapping Network (FMN) and Temporal Dependency Network (TDN). FMN enhances temporal information through feature interaction and feature weighting, while TDN encodes temporal dependencies in sequences to improve sensitivity to complex dynamic expressions. Finally, a feature representation with both facial emotional and temporal features is formed for DFER. We present promising results that surpass current state-of-the-art (SOTA) techniques on two widely recognized DFER benchmark datasets, DFEW and FERV39K. In the DFEW data set, it achieves 71.24% for unweighted average recall (UAR) and 86.81% for weighted average recall (WAR). In the FERV39K dataset, it reaches 48.59% for UAR and 60.42% for WAR. These experimental results indicate that our approach outperforms existing SOTA methods in the DFER task, suggesting the potential effectiveness of the RTT model.
期刊介绍:
Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal.
The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as:
• big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,