{"title":"深度巴拉塔尼亚舞姿势识别:小波多头渐进关注","authors":"D. Anil Kumar, P. V. V. Kishore, K. Sravani","doi":"10.1007/s10044-024-01273-0","DOIUrl":null,"url":null,"abstract":"<p>Human pose identification from 2D video sequences is extremely challenging under the influence of recording artifacts such as lighting, sensor motion, unpredictable subject movements and many more. In this work, the objective is to recognize rhythmic human poses from independently sourced online videos of an Indian classical dance form, Bharatanatyam. The data set (BOICDVD22) consists of internet-sourced video frames of 5 different songs from 10 dancers that are labelled into the corresponding lyrical classes. Inferencing and achieving a decent accuracy on the models trained with this multi-sourced online data is a challenging task. The past works focused on the creation of a miniature offline non-shareable ICD dataset with standard deep learning models which resulted in unsatisfactory performance. Recently, attention-based feature learning has been driving the performance of deep learning models. The most suitable attention mechanism for online data is wavelet-based attention. Though successful, wavelet-based feature learning is applied across one layer and is dependent on global average pooling (GAP) in both channel and spatial dimensions. The current generation of wavelet attention has resulted in unbalanced spatial attention across all the video frames. To overcome this unbalanced attention and induce human-like attention this work proposes to replace the GAP wavelet channel or spatial at a particular layer in the backbone architecture with wavelet multi-head progressive attention (WMHPA). It enhances the attention mechanism as well as decreases information loss because of no GAP. Progressiveness in attention enables the WMHPA to evenly distribute attention features across all the video frames. The results show the highest possible accuracy on the dance data set due to multi-resolution attention across the entire network. The WMHPA validates against state-of-the-art on our ICD as well as benchmarked person re-identification action datasets.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"18 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Bharatanatyam pose recognition: a wavelet multi head progressive attention\",\"authors\":\"D. Anil Kumar, P. V. V. Kishore, K. Sravani\",\"doi\":\"10.1007/s10044-024-01273-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Human pose identification from 2D video sequences is extremely challenging under the influence of recording artifacts such as lighting, sensor motion, unpredictable subject movements and many more. In this work, the objective is to recognize rhythmic human poses from independently sourced online videos of an Indian classical dance form, Bharatanatyam. The data set (BOICDVD22) consists of internet-sourced video frames of 5 different songs from 10 dancers that are labelled into the corresponding lyrical classes. Inferencing and achieving a decent accuracy on the models trained with this multi-sourced online data is a challenging task. The past works focused on the creation of a miniature offline non-shareable ICD dataset with standard deep learning models which resulted in unsatisfactory performance. Recently, attention-based feature learning has been driving the performance of deep learning models. The most suitable attention mechanism for online data is wavelet-based attention. Though successful, wavelet-based feature learning is applied across one layer and is dependent on global average pooling (GAP) in both channel and spatial dimensions. The current generation of wavelet attention has resulted in unbalanced spatial attention across all the video frames. To overcome this unbalanced attention and induce human-like attention this work proposes to replace the GAP wavelet channel or spatial at a particular layer in the backbone architecture with wavelet multi-head progressive attention (WMHPA). It enhances the attention mechanism as well as decreases information loss because of no GAP. Progressiveness in attention enables the WMHPA to evenly distribute attention features across all the video frames. The results show the highest possible accuracy on the dance data set due to multi-resolution attention across the entire network. The WMHPA validates against state-of-the-art on our ICD as well as benchmarked person re-identification action datasets.</p>\",\"PeriodicalId\":54639,\"journal\":{\"name\":\"Pattern Analysis and Applications\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Analysis and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10044-024-01273-0\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Analysis and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10044-024-01273-0","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Deep Bharatanatyam pose recognition: a wavelet multi head progressive attention
Human pose identification from 2D video sequences is extremely challenging under the influence of recording artifacts such as lighting, sensor motion, unpredictable subject movements and many more. In this work, the objective is to recognize rhythmic human poses from independently sourced online videos of an Indian classical dance form, Bharatanatyam. The data set (BOICDVD22) consists of internet-sourced video frames of 5 different songs from 10 dancers that are labelled into the corresponding lyrical classes. Inferencing and achieving a decent accuracy on the models trained with this multi-sourced online data is a challenging task. The past works focused on the creation of a miniature offline non-shareable ICD dataset with standard deep learning models which resulted in unsatisfactory performance. Recently, attention-based feature learning has been driving the performance of deep learning models. The most suitable attention mechanism for online data is wavelet-based attention. Though successful, wavelet-based feature learning is applied across one layer and is dependent on global average pooling (GAP) in both channel and spatial dimensions. The current generation of wavelet attention has resulted in unbalanced spatial attention across all the video frames. To overcome this unbalanced attention and induce human-like attention this work proposes to replace the GAP wavelet channel or spatial at a particular layer in the backbone architecture with wavelet multi-head progressive attention (WMHPA). It enhances the attention mechanism as well as decreases information loss because of no GAP. Progressiveness in attention enables the WMHPA to evenly distribute attention features across all the video frames. The results show the highest possible accuracy on the dance data set due to multi-resolution attention across the entire network. The WMHPA validates against state-of-the-art on our ICD as well as benchmarked person re-identification action datasets.
期刊介绍:
The journal publishes high quality articles in areas of fundamental research in intelligent pattern analysis and applications in computer science and engineering. It aims to provide a forum for original research which describes novel pattern analysis techniques and industrial applications of the current technology. In addition, the journal will also publish articles on pattern analysis applications in medical imaging. The journal solicits articles that detail new technology and methods for pattern recognition and analysis in applied domains including, but not limited to, computer vision and image processing, speech analysis, robotics, multimedia, document analysis, character recognition, knowledge engineering for pattern recognition, fractal analysis, and intelligent control. The journal publishes articles on the use of advanced pattern recognition and analysis methods including statistical techniques, neural networks, genetic algorithms, fuzzy pattern recognition, machine learning, and hardware implementations which are either relevant to the development of pattern analysis as a research area or detail novel pattern analysis applications. Papers proposing new classifier systems or their development, pattern analysis systems for real-time applications, fuzzy and temporal pattern recognition and uncertainty management in applied pattern recognition are particularly solicited.