A lightweight attention-driven distillation model for human pose estimation

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters Pub Date : 2024-09-01 DOI:10.1016/j.patrec.2024.08.009

Falai Wei, Xiaofang Hu

{"title":"A lightweight attention-driven distillation model for human pose estimation","authors":"Falai Wei, Xiaofang Hu","doi":"10.1016/j.patrec.2024.08.009","DOIUrl":null,"url":null,"abstract":"<div><p>Currently, research on human pose estimation tasks primarily focuses on heatmap-based and regression-based methods. However, the increasing complexity of heatmap models and the low accuracy of regression methods are becoming significant barriers to the advancement of the field. In recent years, researchers have begun exploring new methods to transfer knowledge from heatmap models to regression models. Recognizing the limitations of existing approaches, our study introduces a novel distillation model that is both lightweight and precise. In the feature extraction phase, we design the Channel-Attention-Unit (CAU), which integrates group convolution with an attention mechanism to effectively reduce redundancy while maintaining model accuracy with a decreased parameter count. During distillation, we develop the attention loss function, <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>A</mi></mrow></msub></math></span>, which enhances the model’s capacity to locate key points quickly and accurately, emulating the effect of additional transformer layers and boosting precision without the need for increased parameters or network depth. Specifically, on the CrowdPose test dataset, our model achieves 71.7% mAP with 4.3M parameters, 2.2 GFLOPs, and 51.3 FPS. Experimental results demonstrates the model’s strong capabilities in both accuracy and efficiency, making it a viable option for real-time posture estimation tasks in real-world environments.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 247-253"},"PeriodicalIF":3.9000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524002411","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Currently, research on human pose estimation tasks primarily focuses on heatmap-based and regression-based methods. However, the increasing complexity of heatmap models and the low accuracy of regression methods are becoming significant barriers to the advancement of the field. In recent years, researchers have begun exploring new methods to transfer knowledge from heatmap models to regression models. Recognizing the limitations of existing approaches, our study introduces a novel distillation model that is both lightweight and precise. In the feature extraction phase, we design the Channel-Attention-Unit (CAU), which integrates group convolution with an attention mechanism to effectively reduce redundancy while maintaining model accuracy with a decreased parameter count. During distillation, we develop the attention loss function, $L_{A}$ , which enhances the model’s capacity to locate key points quickly and accurately, emulating the effect of additional transformer layers and boosting precision without the need for increased parameters or network depth. Specifically, on the CrowdPose test dataset, our model achieves 71.7% mAP with 4.3M parameters, 2.2 GFLOPs, and 51.3 FPS. Experimental results demonstrates the model’s strong capabilities in both accuracy and efficiency, making it a viable option for real-time posture estimation tasks in real-world environments.

查看原文本刊更多论文

用于人类姿势估计的轻量级注意力驱动蒸馏模型

目前，有关人体姿态估计任务的研究主要集中在基于热图和回归的方法上。然而，热图模型的日益复杂性和回归方法的低准确性正成为该领域发展的重大障碍。近年来，研究人员开始探索从热图模型向回归模型转移知识的新方法。认识到现有方法的局限性，我们的研究引入了一种既轻便又精确的新型蒸馏模型。在特征提取阶段，我们设计了通道-注意力单元（CAU），它将群卷积与注意力机制整合在一起，有效减少了冗余，同时在减少参数数量的情况下保持了模型的准确性。在蒸馏过程中，我们开发了注意力损失函数 LA，该函数增强了模型快速、准确定位关键点的能力，模拟了额外变压器层的效果，并在无需增加参数或网络深度的情况下提高了精度。具体来说，在 CrowdPose 测试数据集上，我们的模型在 4.3M 参数、2.2 GFLOPs 和 51.3 FPS 的条件下实现了 71.7% 的 mAP。实验结果表明，该模型在准确性和效率方面都具有很强的能力，使其成为现实环境中实时姿态估计任务的可行选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition Letters 工程技术-计算机：人工智能

CiteScore

12.40

自引率

5.90%

发文量

287

审稿时长

9.1 months

期刊介绍： Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.