Jinjie Shen, Jing Wu, Yan Xing, Min Hu, Xiaohua Wang, Daolun Li, Wenshu Zha
{"title":"Dual-task enhanced global–local temporal–spatial network for depression recognition from facial videos","authors":"Jinjie Shen, Jing Wu, Yan Xing, Min Hu, Xiaohua Wang, Daolun Li, Wenshu Zha","doi":"10.1002/cpe.8255","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In previous studies on facial video depression recognition, although convolutional neural network (CNN) has become a mainstream method, its performance still has room for improvement due to the insufficient extraction of global and local information and the neglect of the correlation of temporal and spatial information. This paper proposes a novel dual-task enhanced global–local temporal–spatial network (DTE-GLTS) to enhance the extraction capability of global and local features and deepen the analysis of temporal–spatial information correlation. We design a dual-task learning mode that utilizes the data-efficient image transformer (Deit) as the main body to learn the global features of video sequences and guides Deit to learn local features with the pre-trained temporal–spatial fusion network (TSF). In addition, we propose the TSF mechanism to more effectively fuse temporal–spatial information in video sequences, strengthen the correlation between frames and pixels, and embed it in Resnet to form the TSF network. To the best of our knowledge, this is the first application of Deit and dual-task learning mode in the field of facial video depression recognition. The experimental results on AVEC 2013 and AVEC 2014 show that our method achieves competitive performance, with mean absolute error/root mean square error (MAE/RMSE) scores of 6.06/7.73 and 5.91/7.68, respectively, while significantly reducing the number of parameters.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"36 25","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8255","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
In previous studies on facial video depression recognition, although convolutional neural network (CNN) has become a mainstream method, its performance still has room for improvement due to the insufficient extraction of global and local information and the neglect of the correlation of temporal and spatial information. This paper proposes a novel dual-task enhanced global–local temporal–spatial network (DTE-GLTS) to enhance the extraction capability of global and local features and deepen the analysis of temporal–spatial information correlation. We design a dual-task learning mode that utilizes the data-efficient image transformer (Deit) as the main body to learn the global features of video sequences and guides Deit to learn local features with the pre-trained temporal–spatial fusion network (TSF). In addition, we propose the TSF mechanism to more effectively fuse temporal–spatial information in video sequences, strengthen the correlation between frames and pixels, and embed it in Resnet to form the TSF network. To the best of our knowledge, this is the first application of Deit and dual-task learning mode in the field of facial video depression recognition. The experimental results on AVEC 2013 and AVEC 2014 show that our method achieves competitive performance, with mean absolute error/root mean square error (MAE/RMSE) scores of 6.06/7.73 and 5.91/7.68, respectively, while significantly reducing the number of parameters.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.