Head Pose Estimation Based on Multi-Level Feature Fusion

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Pattern Recognition and Artificial Intelligence Pub Date : 2024-02-28 DOI:10.1142/s0218001424560020

Chunman Yan, Xiao Zhang

{"title":"Head Pose Estimation Based on Multi-Level Feature Fusion","authors":"Chunman Yan, Xiao Zhang","doi":"10.1142/s0218001424560020","DOIUrl":null,"url":null,"abstract":"<p>Head Pose Estimation (HPE) has a wide range of applications in computer vision, but still faces challenges: (1) Existing studies commonly use Euler angles or quaternions as pose labels, which may lead to discontinuity problems. (2) HPE does not effectively address regression via rotated matrices. (3) There is a low recognition rate in complex scenes, high computational requirements, etc. This paper presents an improved unconstrained HPE model to address these challenges. First, a rotation matrix form is introduced to solve the problem of unclear rotation labels. Second, a continuous 6D rotation matrix representation is used for efficient and robust direct regression. The RepVGG-A2 lightweight framework is used for feature extraction, and by adding a multi-level feature fusion module and a coordinate attention mechanism with residual connection, to improve the network’s ability to perceive contextual information and pay attention to features. The model’s accuracy was further improved by replacing the network activation function and improving the loss function. Experiments on the BIWI dataset 7:3 dividing the training and test sets show that the average absolute error of HPE for the proposed network model is 2.41. Trained on the dataset 300W_LP and tested on the AFLW2000 and BIWI datasets, the average absolute errors of HPE of the proposed network model are 4.34 and 3.93. The experimental results demonstrate that the improved network has better HPE performance.</p>","PeriodicalId":54949,"journal":{"name":"International Journal of Pattern Recognition and Artificial Intelligence","volume":"32 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Pattern Recognition and Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218001424560020","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Head Pose Estimation (HPE) has a wide range of applications in computer vision, but still faces challenges: (1) Existing studies commonly use Euler angles or quaternions as pose labels, which may lead to discontinuity problems. (2) HPE does not effectively address regression via rotated matrices. (3) There is a low recognition rate in complex scenes, high computational requirements, etc. This paper presents an improved unconstrained HPE model to address these challenges. First, a rotation matrix form is introduced to solve the problem of unclear rotation labels. Second, a continuous 6D rotation matrix representation is used for efficient and robust direct regression. The RepVGG-A2 lightweight framework is used for feature extraction, and by adding a multi-level feature fusion module and a coordinate attention mechanism with residual connection, to improve the network’s ability to perceive contextual information and pay attention to features. The model’s accuracy was further improved by replacing the network activation function and improving the loss function. Experiments on the BIWI dataset 7:3 dividing the training and test sets show that the average absolute error of HPE for the proposed network model is 2.41. Trained on the dataset 300W_LP and tested on the AFLW2000 and BIWI datasets, the average absolute errors of HPE of the proposed network model are 4.34 and 3.93. The experimental results demonstrate that the improved network has better HPE performance.

查看原文本刊更多论文

基于多层次特征融合的头部姿势估计

头部姿态估计（HPE）在计算机视觉领域有着广泛的应用，但仍然面临着挑战：（1）现有研究通常使用欧拉角或四元数作为姿态标签，这可能会导致不连续性问题。(2) HPE 无法有效解决通过旋转矩阵进行回归的问题。(3) 在复杂场景中识别率低，计算要求高，等等。本文提出了一种改进的无约束 HPE 模型来应对这些挑战。首先，引入旋转矩阵形式来解决旋转标签不清晰的问题。其次，使用连续的 6D 旋转矩阵表示法进行高效、稳健的直接回归。采用 RepVGG-A2 轻量级框架进行特征提取，并通过添加多级特征融合模块和具有残差连接的协调关注机制，提高网络感知上下文信息和关注特征的能力。通过替换网络激活函数和改进损失函数，进一步提高了模型的准确性。在 BIWI 数据集 7:3 的训练集和测试集上的实验表明，所提出的网络模型的 HPE 平均绝对误差为 2.41。在数据集 300W_LP 上进行训练，并在 AFLW2000 和 BIWI 数据集上进行测试，所提出网络模型的 HPE 平均绝对误差分别为 4.34 和 3.93。实验结果表明，改进后的网络具有更好的 HPE 性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Pattern Recognition and Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

2.90

自引率

13.30%

发文量

201

审稿时长

15.8 months

期刊介绍： The International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI) welcomes both theory-oriented and innovative applications articles on new developments and is of interest to both researchers in academia and industry. The current scope of this journal includes: • Pattern Recognition • Machine Learning • Deep Learning • Document Analysis • Image Processing • Signal Processing • Computer Vision • Biometrics • Biomedical Image Analysis • Artificial Intelligence In addition to regular papers describing original research work, survey articles on timely and important research topics are highly welcome. Special issues with focused topics within the scope of this journal are also published.