{"title":"结合深度学习网络和变压器的三维人体姿态估计","authors":"T. Tran, Xuan-Thuy Vo, Duy-Linh Nguyen, K. Jo","doi":"10.23919/ICCAS55662.2022.10003954","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have attained the maximum performance today not just for human pose estimation but also for other machine vision applications (e.g., semantic segmentation, object detection, image classification). Besides, the Transformer shows its good performance for extracting the information in temporal information for video challenges. As a result, the combination of deep learner and transformer gains a better performance than only the utility one, especially for 3D human pose estimation. At the start point, input the 2D key point into the deep learner layer and transformer and then use the additional function to combine the extracted information. Finally, the network collects more data in terms of using the fully connected layer to generate the 3D human pose which makes the result increased precision efficiency. Our research would also reveal the relationship between the use of the deep learner and transformer. When compared to the baseline-DNNs, the suggested architecture outperforms the baseline-DNNs average error under Protocol 1 and Protocol 2 in the Human3.6M dataset, which is now available as a popular dataset for 3D human pose estimation.","PeriodicalId":129856,"journal":{"name":"2022 22nd International Conference on Control, Automation and Systems (ICCAS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Combination of Deep Learner Network and Transformer for 3D Human Pose Estimation\",\"authors\":\"T. Tran, Xuan-Thuy Vo, Duy-Linh Nguyen, K. Jo\",\"doi\":\"10.23919/ICCAS55662.2022.10003954\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) have attained the maximum performance today not just for human pose estimation but also for other machine vision applications (e.g., semantic segmentation, object detection, image classification). Besides, the Transformer shows its good performance for extracting the information in temporal information for video challenges. As a result, the combination of deep learner and transformer gains a better performance than only the utility one, especially for 3D human pose estimation. At the start point, input the 2D key point into the deep learner layer and transformer and then use the additional function to combine the extracted information. Finally, the network collects more data in terms of using the fully connected layer to generate the 3D human pose which makes the result increased precision efficiency. Our research would also reveal the relationship between the use of the deep learner and transformer. When compared to the baseline-DNNs, the suggested architecture outperforms the baseline-DNNs average error under Protocol 1 and Protocol 2 in the Human3.6M dataset, which is now available as a popular dataset for 3D human pose estimation.\",\"PeriodicalId\":129856,\"journal\":{\"name\":\"2022 22nd International Conference on Control, Automation and Systems (ICCAS)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 22nd International Conference on Control, Automation and Systems (ICCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/ICCAS55662.2022.10003954\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 22nd International Conference on Control, Automation and Systems (ICCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICCAS55662.2022.10003954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Combination of Deep Learner Network and Transformer for 3D Human Pose Estimation
Deep neural networks (DNNs) have attained the maximum performance today not just for human pose estimation but also for other machine vision applications (e.g., semantic segmentation, object detection, image classification). Besides, the Transformer shows its good performance for extracting the information in temporal information for video challenges. As a result, the combination of deep learner and transformer gains a better performance than only the utility one, especially for 3D human pose estimation. At the start point, input the 2D key point into the deep learner layer and transformer and then use the additional function to combine the extracted information. Finally, the network collects more data in terms of using the fully connected layer to generate the 3D human pose which makes the result increased precision efficiency. Our research would also reveal the relationship between the use of the deep learner and transformer. When compared to the baseline-DNNs, the suggested architecture outperforms the baseline-DNNs average error under Protocol 1 and Protocol 2 in the Human3.6M dataset, which is now available as a popular dataset for 3D human pose estimation.