{"title":"高效的高分辨率人体姿态估计网络","authors":"T. Tran, Xuan-Thuy Vo, Duy-Linh Nguyen, K. Jo","doi":"10.1109/IWIS56333.2022.9920796","DOIUrl":null,"url":null,"abstract":"Convolution neural networks (CNNs) have achieved the best performance nowadays not just for 2D or 3D pose estimation but also for many machine vision applications (e.g., image classification, semantic segmentation, object detection and so on). Beside, The Attention Module also show their leader for improve the accuracy in neural network. Hence, the proposed research is focus on creating a suitable feed-forward AM for CNNs which can save the computational cost also improve the accuracy. First, input the tensor into the attention mechanism, which is divided into two main part: channel attention module and spatial attention module. After that, the tensor passing through a stage in the backbone network. The main mechanism then multiplies these two feature maps and sends them to the next stage of backbone. The network enhance the data in terms of long-distance dependencies (channels) and geographic data. Our proposed research would also reveal a distinction between the use of the attention mechanism and nowadays approaches. The proposed research got better result when compare with the baseline-HRNet by 1.3 points in terms of AP but maintain the number of parameter not change much. Our architecture was trained on the COCO 2017 dataset, which are now available as an open benchmark.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient High-Resolution Network for Human Pose Estimation\",\"authors\":\"T. Tran, Xuan-Thuy Vo, Duy-Linh Nguyen, K. Jo\",\"doi\":\"10.1109/IWIS56333.2022.9920796\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolution neural networks (CNNs) have achieved the best performance nowadays not just for 2D or 3D pose estimation but also for many machine vision applications (e.g., image classification, semantic segmentation, object detection and so on). Beside, The Attention Module also show their leader for improve the accuracy in neural network. Hence, the proposed research is focus on creating a suitable feed-forward AM for CNNs which can save the computational cost also improve the accuracy. First, input the tensor into the attention mechanism, which is divided into two main part: channel attention module and spatial attention module. After that, the tensor passing through a stage in the backbone network. The main mechanism then multiplies these two feature maps and sends them to the next stage of backbone. The network enhance the data in terms of long-distance dependencies (channels) and geographic data. Our proposed research would also reveal a distinction between the use of the attention mechanism and nowadays approaches. The proposed research got better result when compare with the baseline-HRNet by 1.3 points in terms of AP but maintain the number of parameter not change much. Our architecture was trained on the COCO 2017 dataset, which are now available as an open benchmark.\",\"PeriodicalId\":340399,\"journal\":{\"name\":\"2022 International Workshop on Intelligent Systems (IWIS)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Workshop on Intelligent Systems (IWIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWIS56333.2022.9920796\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Workshop on Intelligent Systems (IWIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWIS56333.2022.9920796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient High-Resolution Network for Human Pose Estimation
Convolution neural networks (CNNs) have achieved the best performance nowadays not just for 2D or 3D pose estimation but also for many machine vision applications (e.g., image classification, semantic segmentation, object detection and so on). Beside, The Attention Module also show their leader for improve the accuracy in neural network. Hence, the proposed research is focus on creating a suitable feed-forward AM for CNNs which can save the computational cost also improve the accuracy. First, input the tensor into the attention mechanism, which is divided into two main part: channel attention module and spatial attention module. After that, the tensor passing through a stage in the backbone network. The main mechanism then multiplies these two feature maps and sends them to the next stage of backbone. The network enhance the data in terms of long-distance dependencies (channels) and geographic data. Our proposed research would also reveal a distinction between the use of the attention mechanism and nowadays approaches. The proposed research got better result when compare with the baseline-HRNet by 1.3 points in terms of AP but maintain the number of parameter not change much. Our architecture was trained on the COCO 2017 dataset, which are now available as an open benchmark.