Deep Bharatanatyam pose recognition: a wavelet multi head progressive attention

IF 3.7 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
D. Anil Kumar, P. V. V. Kishore, K. Sravani
{"title":"Deep Bharatanatyam pose recognition: a wavelet multi head progressive attention","authors":"D. Anil Kumar, P. V. V. Kishore, K. Sravani","doi":"10.1007/s10044-024-01273-0","DOIUrl":null,"url":null,"abstract":"<p>Human pose identification from 2D video sequences is extremely challenging under the influence of recording artifacts such as lighting, sensor motion, unpredictable subject movements and many more. In this work, the objective is to recognize rhythmic human poses from independently sourced online videos of an Indian classical dance form, Bharatanatyam. The data set (BOICDVD22) consists of internet-sourced video frames of 5 different songs from 10 dancers that are labelled into the corresponding lyrical classes. Inferencing and achieving a decent accuracy on the models trained with this multi-sourced online data is a challenging task. The past works focused on the creation of a miniature offline non-shareable ICD dataset with standard deep learning models which resulted in unsatisfactory performance. Recently, attention-based feature learning has been driving the performance of deep learning models. The most suitable attention mechanism for online data is wavelet-based attention. Though successful, wavelet-based feature learning is applied across one layer and is dependent on global average pooling (GAP) in both channel and spatial dimensions. The current generation of wavelet attention has resulted in unbalanced spatial attention across all the video frames. To overcome this unbalanced attention and induce human-like attention this work proposes to replace the GAP wavelet channel or spatial at a particular layer in the backbone architecture with wavelet multi-head progressive attention (WMHPA). It enhances the attention mechanism as well as decreases information loss because of no GAP. Progressiveness in attention enables the WMHPA to evenly distribute attention features across all the video frames. The results show the highest possible accuracy on the dance data set due to multi-resolution attention across the entire network. The WMHPA validates against state-of-the-art on our ICD as well as benchmarked person re-identification action datasets.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"18 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Analysis and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10044-024-01273-0","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Human pose identification from 2D video sequences is extremely challenging under the influence of recording artifacts such as lighting, sensor motion, unpredictable subject movements and many more. In this work, the objective is to recognize rhythmic human poses from independently sourced online videos of an Indian classical dance form, Bharatanatyam. The data set (BOICDVD22) consists of internet-sourced video frames of 5 different songs from 10 dancers that are labelled into the corresponding lyrical classes. Inferencing and achieving a decent accuracy on the models trained with this multi-sourced online data is a challenging task. The past works focused on the creation of a miniature offline non-shareable ICD dataset with standard deep learning models which resulted in unsatisfactory performance. Recently, attention-based feature learning has been driving the performance of deep learning models. The most suitable attention mechanism for online data is wavelet-based attention. Though successful, wavelet-based feature learning is applied across one layer and is dependent on global average pooling (GAP) in both channel and spatial dimensions. The current generation of wavelet attention has resulted in unbalanced spatial attention across all the video frames. To overcome this unbalanced attention and induce human-like attention this work proposes to replace the GAP wavelet channel or spatial at a particular layer in the backbone architecture with wavelet multi-head progressive attention (WMHPA). It enhances the attention mechanism as well as decreases information loss because of no GAP. Progressiveness in attention enables the WMHPA to evenly distribute attention features across all the video frames. The results show the highest possible accuracy on the dance data set due to multi-resolution attention across the entire network. The WMHPA validates against state-of-the-art on our ICD as well as benchmarked person re-identification action datasets.

Abstract Image

深度巴拉塔尼亚舞姿势识别:小波多头渐进关注
从二维视频序列中识别人体姿态是一项极具挑战性的工作,因为会受到光照、传感器运动、不可预知的主体运动等录制假象的影响。在这项工作中,我们的目标是从独立来源的印度古典舞(Bharatanatyam)在线视频中识别有节奏的人体姿势。数据集(BOICDVD22)由来自互联网的 10 位舞者的 5 首不同歌曲的视频帧组成,这些视频帧被标记为相应的抒情类。使用这种多源在线数据训练的模型进行推理并达到相当的准确度是一项具有挑战性的任务。过去的工作主要是利用标准的深度学习模型创建一个微型离线不可共享 ICD 数据集,但结果并不令人满意。最近,基于注意力的特征学习推动了深度学习模型性能的提升。最适合在线数据的注意力机制是基于小波的注意力。基于小波的特征学习虽然很成功,但只适用于一个层,而且在通道和空间维度上都依赖于全局平均池化(GAP)。目前的小波注意力导致所有视频帧的空间注意力不平衡。为了克服这种不平衡的注意力,并诱发类似人类的注意力,这项工作建议在骨干架构中的特定层用小波多头渐进注意力(WMHPA)取代 GAP 小波通道或空间。由于没有 GAP,它增强了注意力机制并减少了信息损失。注意力的渐进性使 WMHPA 能够在所有视频帧中均匀分布注意力特征。结果表明,由于整个网络的多分辨率注意力,舞蹈数据集的准确率达到了最高。WMHPA 在我们的 ICD 以及基准人物再识别动作数据集上与最先进的技术进行了验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Analysis and Applications
Pattern Analysis and Applications 工程技术-计算机:人工智能
CiteScore
7.40
自引率
2.60%
发文量
76
审稿时长
13.5 months
期刊介绍: The journal publishes high quality articles in areas of fundamental research in intelligent pattern analysis and applications in computer science and engineering. It aims to provide a forum for original research which describes novel pattern analysis techniques and industrial applications of the current technology. In addition, the journal will also publish articles on pattern analysis applications in medical imaging. The journal solicits articles that detail new technology and methods for pattern recognition and analysis in applied domains including, but not limited to, computer vision and image processing, speech analysis, robotics, multimedia, document analysis, character recognition, knowledge engineering for pattern recognition, fractal analysis, and intelligent control. The journal publishes articles on the use of advanced pattern recognition and analysis methods including statistical techniques, neural networks, genetic algorithms, fuzzy pattern recognition, machine learning, and hardware implementations which are either relevant to the development of pattern analysis as a research area or detail novel pattern analysis applications. Papers proposing new classifier systems or their development, pattern analysis systems for real-time applications, fuzzy and temporal pattern recognition and uncertainty management in applied pattern recognition are particularly solicited.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信