{"title":"Dynamic gesture recognition during human–robot interaction in autonomous earthmoving machinery used for construction","authors":"Shiwei Guan, Jiajun Wang, Xiaoling Wang, Chen Ding, Hongyang Liang, Qi Wei","doi":"10.1016/j.aei.2025.103315","DOIUrl":null,"url":null,"abstract":"<div><div>Effective interaction between operators and autonomous earthmoving machinery can accurately convey the rich engineering experience of operators to machines, ensuring efficient human–robot collaboration in construction. In this study, we propose a pipeline for dynamic gesture interaction between authorised operators and autonomous earthmoving machinery. Initially, the autonomous earthmoving machinery preprocessed the video stream using video restoration algorithms if it operated under harsh environmental conditions. Subsequently, the machinery used a safety helmet colour detection algorithm based on YOLOv8 to determine whether an operator has the authorisation to interact with it by recognising the colour of the safety helmet worn by the operator, thereby preventing incorrect operations of the machinery from unauthorised operators. Finally, the autonomous earthmoving machinery utilised the proposed video swin transformer with Adapt multilayer perceptron (AdaptViSwT) dynamic gesture recognition algorithm to recognise dynamic gesture instructions provided by authorised operators and execute the corresponding operations, enabling human–robot collaboration under complex construction conditions. To train the proposed AdaptViSwT effectively, we established a dynamic gesture interaction dataset comprising 6,502 videos that contained nine commonly used instructions for commanding earthmoving machinery. The experiments verified that, on construction-site datasets, the proposed pipeline achieved 91.2% accuracy in detecting authorised worker. In dynamic gesture recognition, it achieved 98.32% accuracy and 98.44% F1-score. These results effectively ensure the safety and reliability of human-robot collaborative construction.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"65 ","pages":"Article 103315"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625002083","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Effective interaction between operators and autonomous earthmoving machinery can accurately convey the rich engineering experience of operators to machines, ensuring efficient human–robot collaboration in construction. In this study, we propose a pipeline for dynamic gesture interaction between authorised operators and autonomous earthmoving machinery. Initially, the autonomous earthmoving machinery preprocessed the video stream using video restoration algorithms if it operated under harsh environmental conditions. Subsequently, the machinery used a safety helmet colour detection algorithm based on YOLOv8 to determine whether an operator has the authorisation to interact with it by recognising the colour of the safety helmet worn by the operator, thereby preventing incorrect operations of the machinery from unauthorised operators. Finally, the autonomous earthmoving machinery utilised the proposed video swin transformer with Adapt multilayer perceptron (AdaptViSwT) dynamic gesture recognition algorithm to recognise dynamic gesture instructions provided by authorised operators and execute the corresponding operations, enabling human–robot collaboration under complex construction conditions. To train the proposed AdaptViSwT effectively, we established a dynamic gesture interaction dataset comprising 6,502 videos that contained nine commonly used instructions for commanding earthmoving machinery. The experiments verified that, on construction-site datasets, the proposed pipeline achieved 91.2% accuracy in detecting authorised worker. In dynamic gesture recognition, it achieved 98.32% accuracy and 98.44% F1-score. These results effectively ensure the safety and reliability of human-robot collaborative construction.
期刊介绍:
Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.