Cheng Peng , Guodong Li , Likang Lin , Bowen Zhang , Kun Zou , Sio Long Lo , Ah Chung Tsoi
{"title":"一种基于交叉方向注意网络的面部表情识别方法","authors":"Cheng Peng , Guodong Li , Likang Lin , Bowen Zhang , Kun Zou , Sio Long Lo , Ah Chung Tsoi","doi":"10.1016/j.engappai.2025.111229","DOIUrl":null,"url":null,"abstract":"<div><div>Facial expression recognition (FER) is an area of growing interest in computer vision research. This paper extends the framework provided by the ‘Distract your Attention Network’ (DAN) which consists of multiple parallel branches, each branch composes of a spatial attention (SA) module followed by a channel attention (CA) module, and then these multiple branches are fused together before being passed into a classifier module. The spatial attention module of DAN has an internal channel dimension of 1, while our proposed Cross Directional Attention Network (CDAN)-I and CDAN-II contain respectively an internal channel dimension of 512 (same as the channel dimension of the input), and internal channel dimension of 1024 (double that of the channel dimension of the input). These increases in internal channel dimension allow extraction of more features, before they are being made to conform with the input channel dimension. Despite these seemingly simple modifications from that of DAN, both CDAN-I and CDAN-II are found to outperform those of DAN, a state-of-the-art FER method, on four popular FER benchmark datasets: RAF-DB (Real world Affective Face-database), AffectNet-7 (AffectNet with Seven Categories) AffectNet-8 ( AffectNet with Eight Categories), and CK+ (Cohn–Kanada Extended). Moreover, we make use of three statistical indexes for clustering analysis, and verified that the CDAN-I and CDAN-II modules have been able to increase the inter-cluster distances, and decrease the intra-cluster distances, when compared with those obtained by the backbone ResNet-18 network (Residual Network with 18 Layers) , thus providing a quantitative analysis technique in this area.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"157 ","pages":"Article 111229"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel facial expression recognition method based on cross direction attention network\",\"authors\":\"Cheng Peng , Guodong Li , Likang Lin , Bowen Zhang , Kun Zou , Sio Long Lo , Ah Chung Tsoi\",\"doi\":\"10.1016/j.engappai.2025.111229\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Facial expression recognition (FER) is an area of growing interest in computer vision research. This paper extends the framework provided by the ‘Distract your Attention Network’ (DAN) which consists of multiple parallel branches, each branch composes of a spatial attention (SA) module followed by a channel attention (CA) module, and then these multiple branches are fused together before being passed into a classifier module. The spatial attention module of DAN has an internal channel dimension of 1, while our proposed Cross Directional Attention Network (CDAN)-I and CDAN-II contain respectively an internal channel dimension of 512 (same as the channel dimension of the input), and internal channel dimension of 1024 (double that of the channel dimension of the input). These increases in internal channel dimension allow extraction of more features, before they are being made to conform with the input channel dimension. Despite these seemingly simple modifications from that of DAN, both CDAN-I and CDAN-II are found to outperform those of DAN, a state-of-the-art FER method, on four popular FER benchmark datasets: RAF-DB (Real world Affective Face-database), AffectNet-7 (AffectNet with Seven Categories) AffectNet-8 ( AffectNet with Eight Categories), and CK+ (Cohn–Kanada Extended). Moreover, we make use of three statistical indexes for clustering analysis, and verified that the CDAN-I and CDAN-II modules have been able to increase the inter-cluster distances, and decrease the intra-cluster distances, when compared with those obtained by the backbone ResNet-18 network (Residual Network with 18 Layers) , thus providing a quantitative analysis technique in this area.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"157 \",\"pages\":\"Article 111229\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625012308\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625012308","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
A novel facial expression recognition method based on cross direction attention network
Facial expression recognition (FER) is an area of growing interest in computer vision research. This paper extends the framework provided by the ‘Distract your Attention Network’ (DAN) which consists of multiple parallel branches, each branch composes of a spatial attention (SA) module followed by a channel attention (CA) module, and then these multiple branches are fused together before being passed into a classifier module. The spatial attention module of DAN has an internal channel dimension of 1, while our proposed Cross Directional Attention Network (CDAN)-I and CDAN-II contain respectively an internal channel dimension of 512 (same as the channel dimension of the input), and internal channel dimension of 1024 (double that of the channel dimension of the input). These increases in internal channel dimension allow extraction of more features, before they are being made to conform with the input channel dimension. Despite these seemingly simple modifications from that of DAN, both CDAN-I and CDAN-II are found to outperform those of DAN, a state-of-the-art FER method, on four popular FER benchmark datasets: RAF-DB (Real world Affective Face-database), AffectNet-7 (AffectNet with Seven Categories) AffectNet-8 ( AffectNet with Eight Categories), and CK+ (Cohn–Kanada Extended). Moreover, we make use of three statistical indexes for clustering analysis, and verified that the CDAN-I and CDAN-II modules have been able to increase the inter-cluster distances, and decrease the intra-cluster distances, when compared with those obtained by the backbone ResNet-18 network (Residual Network with 18 Layers) , thus providing a quantitative analysis technique in this area.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.