A lightweight attention-driven distillation model for human pose estimation

IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Falai Wei, Xiaofang Hu
{"title":"A lightweight attention-driven distillation model for human pose estimation","authors":"Falai Wei,&nbsp;Xiaofang Hu","doi":"10.1016/j.patrec.2024.08.009","DOIUrl":null,"url":null,"abstract":"<div><p>Currently, research on human pose estimation tasks primarily focuses on heatmap-based and regression-based methods. However, the increasing complexity of heatmap models and the low accuracy of regression methods are becoming significant barriers to the advancement of the field. In recent years, researchers have begun exploring new methods to transfer knowledge from heatmap models to regression models. Recognizing the limitations of existing approaches, our study introduces a novel distillation model that is both lightweight and precise. In the feature extraction phase, we design the Channel-Attention-Unit (CAU), which integrates group convolution with an attention mechanism to effectively reduce redundancy while maintaining model accuracy with a decreased parameter count. During distillation, we develop the attention loss function, <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>A</mi></mrow></msub></math></span>, which enhances the model’s capacity to locate key points quickly and accurately, emulating the effect of additional transformer layers and boosting precision without the need for increased parameters or network depth. Specifically, on the CrowdPose test dataset, our model achieves 71.7% mAP with 4.3M parameters, 2.2 GFLOPs, and 51.3 FPS. Experimental results demonstrates the model’s strong capabilities in both accuracy and efficiency, making it a viable option for real-time posture estimation tasks in real-world environments.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 247-253"},"PeriodicalIF":3.9000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524002411","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Currently, research on human pose estimation tasks primarily focuses on heatmap-based and regression-based methods. However, the increasing complexity of heatmap models and the low accuracy of regression methods are becoming significant barriers to the advancement of the field. In recent years, researchers have begun exploring new methods to transfer knowledge from heatmap models to regression models. Recognizing the limitations of existing approaches, our study introduces a novel distillation model that is both lightweight and precise. In the feature extraction phase, we design the Channel-Attention-Unit (CAU), which integrates group convolution with an attention mechanism to effectively reduce redundancy while maintaining model accuracy with a decreased parameter count. During distillation, we develop the attention loss function, LA, which enhances the model’s capacity to locate key points quickly and accurately, emulating the effect of additional transformer layers and boosting precision without the need for increased parameters or network depth. Specifically, on the CrowdPose test dataset, our model achieves 71.7% mAP with 4.3M parameters, 2.2 GFLOPs, and 51.3 FPS. Experimental results demonstrates the model’s strong capabilities in both accuracy and efficiency, making it a viable option for real-time posture estimation tasks in real-world environments.

用于人类姿势估计的轻量级注意力驱动蒸馏模型
目前,有关人体姿态估计任务的研究主要集中在基于热图和回归的方法上。然而,热图模型的日益复杂性和回归方法的低准确性正成为该领域发展的重大障碍。近年来,研究人员开始探索从热图模型向回归模型转移知识的新方法。认识到现有方法的局限性,我们的研究引入了一种既轻便又精确的新型蒸馏模型。在特征提取阶段,我们设计了通道-注意力单元(CAU),它将群卷积与注意力机制整合在一起,有效减少了冗余,同时在减少参数数量的情况下保持了模型的准确性。在蒸馏过程中,我们开发了注意力损失函数 LA,该函数增强了模型快速、准确定位关键点的能力,模拟了额外变压器层的效果,并在无需增加参数或网络深度的情况下提高了精度。具体来说,在 CrowdPose 测试数据集上,我们的模型在 4.3M 参数、2.2 GFLOPs 和 51.3 FPS 的条件下实现了 71.7% 的 mAP。实验结果表明,该模型在准确性和效率方面都具有很强的能力,使其成为现实环境中实时姿态估计任务的可行选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition Letters
Pattern Recognition Letters 工程技术-计算机:人工智能
CiteScore
12.40
自引率
5.90%
发文量
287
审稿时长
9.1 months
期刊介绍: Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信