{"title":"结合卷积特征和Transformer的表情识别方法","authors":"Xiaoning Zhu, Zhongyi Li, Jian Sun","doi":"10.3934/mfc.2022018","DOIUrl":null,"url":null,"abstract":"Expression recognition has been an important research direction in the field of psychology, which can be used in traffic, medical, security, and criminal investigation by expressing human feelings through the muscles in the corners of the mouth, eyes, and face. Most of the existing research work uses convolutional neural networks (CNN) to recognize face images and thus classify expressions, which does achieve good results, but CNN do not have enough ability to extract global features. The Transformer has advantages for global feature extraction, but the Transformer is more computationally intensive and requires a large amount of training data. So, in this paper, we use the hierarchical Transformer, namely Swin Transformer, for the expression recognition task, and its computational power will be greatly reduced. At the same time, it is fused with a CNN model to propose a network architecture that combines the Transformer and CNN, and to the best of our knowledge, we are the first to combine the Swin Transformer with CNN and use it in an expression recognition task. We then evaluate the proposed method on some publicly available expression datasets and can obtain competitive results.","PeriodicalId":93334,"journal":{"name":"Mathematical foundations of computing","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Expression recognition method combining convolutional features and Transformer\",\"authors\":\"Xiaoning Zhu, Zhongyi Li, Jian Sun\",\"doi\":\"10.3934/mfc.2022018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Expression recognition has been an important research direction in the field of psychology, which can be used in traffic, medical, security, and criminal investigation by expressing human feelings through the muscles in the corners of the mouth, eyes, and face. Most of the existing research work uses convolutional neural networks (CNN) to recognize face images and thus classify expressions, which does achieve good results, but CNN do not have enough ability to extract global features. The Transformer has advantages for global feature extraction, but the Transformer is more computationally intensive and requires a large amount of training data. So, in this paper, we use the hierarchical Transformer, namely Swin Transformer, for the expression recognition task, and its computational power will be greatly reduced. At the same time, it is fused with a CNN model to propose a network architecture that combines the Transformer and CNN, and to the best of our knowledge, we are the first to combine the Swin Transformer with CNN and use it in an expression recognition task. We then evaluate the proposed method on some publicly available expression datasets and can obtain competitive results.\",\"PeriodicalId\":93334,\"journal\":{\"name\":\"Mathematical foundations of computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mathematical foundations of computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3934/mfc.2022018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical foundations of computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/mfc.2022018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Expression recognition method combining convolutional features and Transformer
Expression recognition has been an important research direction in the field of psychology, which can be used in traffic, medical, security, and criminal investigation by expressing human feelings through the muscles in the corners of the mouth, eyes, and face. Most of the existing research work uses convolutional neural networks (CNN) to recognize face images and thus classify expressions, which does achieve good results, but CNN do not have enough ability to extract global features. The Transformer has advantages for global feature extraction, but the Transformer is more computationally intensive and requires a large amount of training data. So, in this paper, we use the hierarchical Transformer, namely Swin Transformer, for the expression recognition task, and its computational power will be greatly reduced. At the same time, it is fused with a CNN model to propose a network architecture that combines the Transformer and CNN, and to the best of our knowledge, we are the first to combine the Swin Transformer with CNN and use it in an expression recognition task. We then evaluate the proposed method on some publicly available expression datasets and can obtain competitive results.