Yang Lü , Fuchun Zhang , Zongnan Ma , Bo Zheng , Zhixiong Nan
{"title":"Dynamic facial expression recognition in the wild via Multi-Snippet Spatiotemporal Learning","authors":"Yang Lü , Fuchun Zhang , Zongnan Ma , Bo Zheng , Zhixiong Nan","doi":"10.1016/j.neucom.2025.130020","DOIUrl":null,"url":null,"abstract":"<div><div>Dynamic Facial Expression Recognition (DFER) in-the-wild poses a significant challenge in emotion recognition research. Many studies have focused on extracting finer facial features while overlooking the effect of noisy frames on the entire sequence. In addition, the imbalance between short- and long-term temporal relationships remains inadequately addressed. To tackle these issues, we propose the Multi-Snippet Spatiotemporal Learning (MSSL) framework that uses distinct temporal and spatial modeling for snippet feature extraction, enabling more accurate simulation of subtle facial expression changes while capturing finer details. We also introduced a dual-branch hierarchical module, BiTemporal Multi-Snippet Enhancement (BTMSE), which is designed to capture spatiotemporal dependencies and model subtle visual changes across snippets effectively. The Temporal-Transformer further enhances the learning of long-term dependencies, whereas learnable temporal position embeddings ensure consistency between snippet and fused features over time. By leveraging (2+1)D multi-snippet spatiotemporal modeling, BTMSE, and the Temporal-Transformer, MSSL hierarchically explores the complex interrelationships between temporal dynamics and facial expressions. Comparative experiments and ablation studies confirmed the effectiveness of our method on three large-scale in-the-wild datasets: DFEW, FERV39K, and MAFW.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"636 ","pages":"Article 130020"},"PeriodicalIF":5.5000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225006927","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Dynamic facial expression recognition in the wild via Multi-Snippet Spatiotemporal Learning
Dynamic Facial Expression Recognition (DFER) in-the-wild poses a significant challenge in emotion recognition research. Many studies have focused on extracting finer facial features while overlooking the effect of noisy frames on the entire sequence. In addition, the imbalance between short- and long-term temporal relationships remains inadequately addressed. To tackle these issues, we propose the Multi-Snippet Spatiotemporal Learning (MSSL) framework that uses distinct temporal and spatial modeling for snippet feature extraction, enabling more accurate simulation of subtle facial expression changes while capturing finer details. We also introduced a dual-branch hierarchical module, BiTemporal Multi-Snippet Enhancement (BTMSE), which is designed to capture spatiotemporal dependencies and model subtle visual changes across snippets effectively. The Temporal-Transformer further enhances the learning of long-term dependencies, whereas learnable temporal position embeddings ensure consistency between snippet and fused features over time. By leveraging (2+1)D multi-snippet spatiotemporal modeling, BTMSE, and the Temporal-Transformer, MSSL hierarchically explores the complex interrelationships between temporal dynamics and facial expressions. Comparative experiments and ablation studies confirmed the effectiveness of our method on three large-scale in-the-wild datasets: DFEW, FERV39K, and MAFW.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.