{"title":"基于多层次特征融合的头部姿势估计","authors":"Chunman Yan, Xiao Zhang","doi":"10.1142/s0218001424560020","DOIUrl":null,"url":null,"abstract":"<p>Head Pose Estimation (HPE) has a wide range of applications in computer vision, but still faces challenges: (1) Existing studies commonly use Euler angles or quaternions as pose labels, which may lead to discontinuity problems. (2) HPE does not effectively address regression via rotated matrices. (3) There is a low recognition rate in complex scenes, high computational requirements, etc. This paper presents an improved unconstrained HPE model to address these challenges. First, a rotation matrix form is introduced to solve the problem of unclear rotation labels. Second, a continuous 6D rotation matrix representation is used for efficient and robust direct regression. The RepVGG-A2 lightweight framework is used for feature extraction, and by adding a multi-level feature fusion module and a coordinate attention mechanism with residual connection, to improve the network’s ability to perceive contextual information and pay attention to features. The model’s accuracy was further improved by replacing the network activation function and improving the loss function. Experiments on the BIWI dataset 7:3 dividing the training and test sets show that the average absolute error of HPE for the proposed network model is 2.41. Trained on the dataset 300W_LP and tested on the AFLW2000 and BIWI datasets, the average absolute errors of HPE of the proposed network model are 4.34 and 3.93. The experimental results demonstrate that the improved network has better HPE performance.</p>","PeriodicalId":54949,"journal":{"name":"International Journal of Pattern Recognition and Artificial Intelligence","volume":"32 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Head Pose Estimation Based on Multi-Level Feature Fusion\",\"authors\":\"Chunman Yan, Xiao Zhang\",\"doi\":\"10.1142/s0218001424560020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Head Pose Estimation (HPE) has a wide range of applications in computer vision, but still faces challenges: (1) Existing studies commonly use Euler angles or quaternions as pose labels, which may lead to discontinuity problems. (2) HPE does not effectively address regression via rotated matrices. (3) There is a low recognition rate in complex scenes, high computational requirements, etc. This paper presents an improved unconstrained HPE model to address these challenges. First, a rotation matrix form is introduced to solve the problem of unclear rotation labels. Second, a continuous 6D rotation matrix representation is used for efficient and robust direct regression. The RepVGG-A2 lightweight framework is used for feature extraction, and by adding a multi-level feature fusion module and a coordinate attention mechanism with residual connection, to improve the network’s ability to perceive contextual information and pay attention to features. The model’s accuracy was further improved by replacing the network activation function and improving the loss function. Experiments on the BIWI dataset 7:3 dividing the training and test sets show that the average absolute error of HPE for the proposed network model is 2.41. Trained on the dataset 300W_LP and tested on the AFLW2000 and BIWI datasets, the average absolute errors of HPE of the proposed network model are 4.34 and 3.93. The experimental results demonstrate that the improved network has better HPE performance.</p>\",\"PeriodicalId\":54949,\"journal\":{\"name\":\"International Journal of Pattern Recognition and Artificial Intelligence\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2024-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Pattern Recognition and Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1142/s0218001424560020\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Pattern Recognition and Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218001424560020","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Head Pose Estimation Based on Multi-Level Feature Fusion
Head Pose Estimation (HPE) has a wide range of applications in computer vision, but still faces challenges: (1) Existing studies commonly use Euler angles or quaternions as pose labels, which may lead to discontinuity problems. (2) HPE does not effectively address regression via rotated matrices. (3) There is a low recognition rate in complex scenes, high computational requirements, etc. This paper presents an improved unconstrained HPE model to address these challenges. First, a rotation matrix form is introduced to solve the problem of unclear rotation labels. Second, a continuous 6D rotation matrix representation is used for efficient and robust direct regression. The RepVGG-A2 lightweight framework is used for feature extraction, and by adding a multi-level feature fusion module and a coordinate attention mechanism with residual connection, to improve the network’s ability to perceive contextual information and pay attention to features. The model’s accuracy was further improved by replacing the network activation function and improving the loss function. Experiments on the BIWI dataset 7:3 dividing the training and test sets show that the average absolute error of HPE for the proposed network model is 2.41. Trained on the dataset 300W_LP and tested on the AFLW2000 and BIWI datasets, the average absolute errors of HPE of the proposed network model are 4.34 and 3.93. The experimental results demonstrate that the improved network has better HPE performance.
期刊介绍:
The International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI) welcomes both theory-oriented and innovative applications articles on new developments and is of interest to both researchers in academia and industry.
The current scope of this journal includes:
• Pattern Recognition
• Machine Learning
• Deep Learning
• Document Analysis
• Image Processing
• Signal Processing
• Computer Vision
• Biometrics
• Biomedical Image Analysis
• Artificial Intelligence
In addition to regular papers describing original research work, survey articles on timely and important research topics are highly welcome. Special issues with focused topics within the scope of this journal are also published.