Tong Zhang , Qilin Li , Jingtao Wen , C.L. Philip Chen
{"title":"利用多尺度空间注意力和对抗性数据增强增强和优化人体姿态估计","authors":"Tong Zhang , Qilin Li , Jingtao Wen , C.L. Philip Chen","doi":"10.1016/j.inffus.2024.102522","DOIUrl":null,"url":null,"abstract":"<div><p>Human pose estimation, a vital pursuit in the realm of computer vision, aims to predict the spatial coordinates of key points within images. Despite the advancements achieved by employing a Convolution Neural Network (CNN), this task still faces considerable challenges, especially in handling occlusion and overfitting issues. This paper introduces a new human pose estimation network designed to address the challenges posed by occluded and blurred images. It features a multi-scale spatial attention mechanism that zeroes in on the human body, significantly improving feature extraction for complex images. Moreover, this versatile attention module is compatible with a wide range of convolutional neural network-based pose estimation frameworks, unlike other mechanisms restricted to particular networks. Addressing the overfitting issue in human pose estimation models, this paper introduces an adversarial network-based data augmentation technique. A generator specifically tailored for pose estimation is adversarially trained to produce optimal augmentation samples, thereby reducing model overfitting. Experimental validation confirms that this augmentation method notably enhances the prediction accuracy of the pose estimation model without incurring extra computational costs. In addition, this paper introduces a streamlined Feature Pyramid Network (FPN) that enables shallow networks to assimilate extensive-scale data, addressing the issue of excessive model size. The experimental validation on the benchmark datasets MPII and MSCOCO demonstrates the efficacy of this integrated approach, showcasing significant improvements in the accuracy and the overall performance of human pose estimation and surpassing the existing methodologies. This approach effectively enhances the performance of the baseline model, achieving the best accuracy of 92.2% and 80.4% on the MPII and MSCOCO, respectively.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"111 ","pages":"Article 102522"},"PeriodicalIF":14.7000,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancement and optimisation of human pose estimation with multi-scale spatial attention and adversarial data augmentation\",\"authors\":\"Tong Zhang , Qilin Li , Jingtao Wen , C.L. Philip Chen\",\"doi\":\"10.1016/j.inffus.2024.102522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Human pose estimation, a vital pursuit in the realm of computer vision, aims to predict the spatial coordinates of key points within images. Despite the advancements achieved by employing a Convolution Neural Network (CNN), this task still faces considerable challenges, especially in handling occlusion and overfitting issues. This paper introduces a new human pose estimation network designed to address the challenges posed by occluded and blurred images. It features a multi-scale spatial attention mechanism that zeroes in on the human body, significantly improving feature extraction for complex images. Moreover, this versatile attention module is compatible with a wide range of convolutional neural network-based pose estimation frameworks, unlike other mechanisms restricted to particular networks. Addressing the overfitting issue in human pose estimation models, this paper introduces an adversarial network-based data augmentation technique. A generator specifically tailored for pose estimation is adversarially trained to produce optimal augmentation samples, thereby reducing model overfitting. Experimental validation confirms that this augmentation method notably enhances the prediction accuracy of the pose estimation model without incurring extra computational costs. In addition, this paper introduces a streamlined Feature Pyramid Network (FPN) that enables shallow networks to assimilate extensive-scale data, addressing the issue of excessive model size. The experimental validation on the benchmark datasets MPII and MSCOCO demonstrates the efficacy of this integrated approach, showcasing significant improvements in the accuracy and the overall performance of human pose estimation and surpassing the existing methodologies. This approach effectively enhances the performance of the baseline model, achieving the best accuracy of 92.2% and 80.4% on the MPII and MSCOCO, respectively.</p></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"111 \",\"pages\":\"Article 102522\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2024-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253524003002\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524003002","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Enhancement and optimisation of human pose estimation with multi-scale spatial attention and adversarial data augmentation
Human pose estimation, a vital pursuit in the realm of computer vision, aims to predict the spatial coordinates of key points within images. Despite the advancements achieved by employing a Convolution Neural Network (CNN), this task still faces considerable challenges, especially in handling occlusion and overfitting issues. This paper introduces a new human pose estimation network designed to address the challenges posed by occluded and blurred images. It features a multi-scale spatial attention mechanism that zeroes in on the human body, significantly improving feature extraction for complex images. Moreover, this versatile attention module is compatible with a wide range of convolutional neural network-based pose estimation frameworks, unlike other mechanisms restricted to particular networks. Addressing the overfitting issue in human pose estimation models, this paper introduces an adversarial network-based data augmentation technique. A generator specifically tailored for pose estimation is adversarially trained to produce optimal augmentation samples, thereby reducing model overfitting. Experimental validation confirms that this augmentation method notably enhances the prediction accuracy of the pose estimation model without incurring extra computational costs. In addition, this paper introduces a streamlined Feature Pyramid Network (FPN) that enables shallow networks to assimilate extensive-scale data, addressing the issue of excessive model size. The experimental validation on the benchmark datasets MPII and MSCOCO demonstrates the efficacy of this integrated approach, showcasing significant improvements in the accuracy and the overall performance of human pose estimation and surpassing the existing methodologies. This approach effectively enhances the performance of the baseline model, achieving the best accuracy of 92.2% and 80.4% on the MPII and MSCOCO, respectively.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.