{"title":"CWPR: An optimized transformer-based model for construction worker pose estimation on construction robots","authors":"","doi":"10.1016/j.aei.2024.102894","DOIUrl":null,"url":null,"abstract":"<div><div>Estimating construction workers’ poses is critically important for recognizing unsafe behaviors, conducting ergonomic analyses, and assessing productivity. Recently, utilizing construction robots to capture RGB images for pose estimation offers flexible monitoring perspectives and timely interventions. However, existing multi-human pose estimation (MHPE) methods struggle to balance accuracy and speed, making them unsuitable for real-time applications on construction robots. This paper introduces the Construction Worker Pose Recognizer (CWPR), an optimized Transformer-based MHPE model tailored for construction robots. Specifically, CWPR utilizes a lightweight encoder equipped with a multi-scale feature fusion module to enhance operational speed. Then, an Intersection over Union (IoU)-aware query selection strategy is employed to provide high-quality initial queries for the hybrid decoder, significantly improving performance. Besides, a decoder denoising module is used to incorporate noisy ground truth into the decoder, mitigating sample imbalance and further improving accuracy. Additionally, the Construction Worker Pose and Action (CWPA) dataset is collected from 154 videos captured in real construction scenarios. The dataset is annotated for different tasks: a pose benchmark for MHPE and an action benchmark for action recognition. Experiments demonstrate that CWPR achieves top-level accuracy and the fastest inference speed, attaining 68.1 Average Precision (AP) with a processing time of 26 ms on the COCO test set and 76.2 AP with 21 ms on the CWPA pose benchmark. Moreover, when integrated with the action recognition method ST-GCN on construction robot hardware, CWPR achieves 78.7 AP and a processing time of 19 ms on the CWPA action benchmark, validating its effectiveness for practical deployment.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":null,"pages":null},"PeriodicalIF":8.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034624005457","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Estimating construction workers’ poses is critically important for recognizing unsafe behaviors, conducting ergonomic analyses, and assessing productivity. Recently, utilizing construction robots to capture RGB images for pose estimation offers flexible monitoring perspectives and timely interventions. However, existing multi-human pose estimation (MHPE) methods struggle to balance accuracy and speed, making them unsuitable for real-time applications on construction robots. This paper introduces the Construction Worker Pose Recognizer (CWPR), an optimized Transformer-based MHPE model tailored for construction robots. Specifically, CWPR utilizes a lightweight encoder equipped with a multi-scale feature fusion module to enhance operational speed. Then, an Intersection over Union (IoU)-aware query selection strategy is employed to provide high-quality initial queries for the hybrid decoder, significantly improving performance. Besides, a decoder denoising module is used to incorporate noisy ground truth into the decoder, mitigating sample imbalance and further improving accuracy. Additionally, the Construction Worker Pose and Action (CWPA) dataset is collected from 154 videos captured in real construction scenarios. The dataset is annotated for different tasks: a pose benchmark for MHPE and an action benchmark for action recognition. Experiments demonstrate that CWPR achieves top-level accuracy and the fastest inference speed, attaining 68.1 Average Precision (AP) with a processing time of 26 ms on the COCO test set and 76.2 AP with 21 ms on the CWPA pose benchmark. Moreover, when integrated with the action recognition method ST-GCN on construction robot hardware, CWPR achieves 78.7 AP and a processing time of 19 ms on the CWPA action benchmark, validating its effectiveness for practical deployment.
期刊介绍:
Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.