Excavator 3D pose estimation from point cloud with self-supervised deep learning

IF 8.5 1区工程技术 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer-Aided Civil and Infrastructure Engineering Pub Date : 2025-05-03 DOI:10.1111/mice.13500

Mingyu Zhang, Wenkang Guo, Jiawen Zhang, Shuai Han, Heng Li, Hongzhe Yue

{"title":"Excavator 3D pose estimation from point cloud with self-supervised deep learning","authors":"Mingyu Zhang, Wenkang Guo, Jiawen Zhang, Shuai Han, Heng Li, Hongzhe Yue","doi":"10.1111/mice.13500","DOIUrl":null,"url":null,"abstract":"Pose estimation of excavators is a fundamental yet challenging task with significant implications for intelligent construction. Traditional methods based on cameras or sensors are often limited by their ability to perceive spatial structures. To address this, 3D light detection and ranging has emerged as a promising paradigm for excavator pose estimation. However, these methods face significant challenges: (1) accurate 3D pose annotations are labor-intensive and costly, and (2) excavators exhibit complex kinematics and geometric structures, further complicating pose estimation. In this study, a novel framework is proposed for full-body excavator pose estimation directly from 3D point clouds, without relying on manual 3D annotations. The excavator pose is parameterized using pose parameters of geometric primitives under kinematic constraints. A unified deep network is designed to predict pose parameters from point clouds. The network is initially pre-trained on synthetic data to provide parameter initialization and then fine-tuned using real-world data. To facilitate label-free training, the self-supervised loss functions are designed by exploiting the geometric and kinematic consistency between point clouds and excavators. Experimental results on real-world construction sites demonstrate the effectiveness and robustness of the proposed method, achieving an average pose estimation accuracy of 0.26 m. The method also exhibits promising performance across various excavator operational scenarios, highlighting its potential for real-world applications.","PeriodicalId":156,"journal":{"name":"Computer-Aided Civil and Infrastructure Engineering","volume":"22 1","pages":""},"PeriodicalIF":8.5000,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer-Aided Civil and Infrastructure Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1111/mice.13500","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Pose estimation of excavators is a fundamental yet challenging task with significant implications for intelligent construction. Traditional methods based on cameras or sensors are often limited by their ability to perceive spatial structures. To address this, 3D light detection and ranging has emerged as a promising paradigm for excavator pose estimation. However, these methods face significant challenges: (1) accurate 3D pose annotations are labor-intensive and costly, and (2) excavators exhibit complex kinematics and geometric structures, further complicating pose estimation. In this study, a novel framework is proposed for full-body excavator pose estimation directly from 3D point clouds, without relying on manual 3D annotations. The excavator pose is parameterized using pose parameters of geometric primitives under kinematic constraints. A unified deep network is designed to predict pose parameters from point clouds. The network is initially pre-trained on synthetic data to provide parameter initialization and then fine-tuned using real-world data. To facilitate label-free training, the self-supervised loss functions are designed by exploiting the geometric and kinematic consistency between point clouds and excavators. Experimental results on real-world construction sites demonstrate the effectiveness and robustness of the proposed method, achieving an average pose estimation accuracy of 0.26 m. The method also exhibits promising performance across various excavator operational scenarios, highlighting its potential for real-world applications.

查看原文本刊更多论文

基于自监督深度学习的点云挖掘机三维姿态估计

挖掘机姿态估计是一项基础而又具有挑战性的任务，对智能建筑具有重要意义。基于相机或传感器的传统方法往往受限于它们感知空间结构的能力。为了解决这个问题，3D光检测和测距已经成为挖掘机姿态估计的一个有前途的范例。然而，这些方法面临着重大挑战：(1)准确的3D姿态注释是劳动密集型和昂贵的；(2)挖掘机具有复杂的运动学和几何结构，进一步复杂化姿态估计。在这项研究中，提出了一种新的框架，可以直接从3D点云中估计全身挖掘机的姿态，而不依赖于手动的3D注释。在运动学约束下，利用几何基元的位姿参数对挖掘机进行位姿参数化。设计了一个统一的深度网络来预测点云的姿态参数。该网络最初在合成数据上进行预训练，以提供参数初始化，然后使用实际数据进行微调。为了方便无标签训练，利用点云和挖掘机之间的几何和运动一致性设计了自监督损失函数。实际施工现场的实验结果证明了该方法的有效性和鲁棒性，平均位姿估计精度为0.26 m。该方法在各种挖掘机操作场景中也表现出良好的性能，突出了其在实际应用中的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer-Aided Civil and Infrastructure Engineering 工程技术-工程：土木

CiteScore

17.60

自引率

19.80%

发文量

146

审稿时长

1 months

期刊介绍： Computer-Aided Civil and Infrastructure Engineering stands as a scholarly, peer-reviewed archival journal, serving as a vital link between advancements in computer technology and civil and infrastructure engineering. The journal serves as a distinctive platform for the publication of original articles, spotlighting novel computational techniques and inventive applications of computers. Specifically, it concentrates on recent progress in computer and information technologies, fostering the development and application of emerging computing paradigms. Encompassing a broad scope, the journal addresses bridge, construction, environmental, highway, geotechnical, structural, transportation, and water resources engineering. It extends its reach to the management of infrastructure systems, covering domains such as highways, bridges, pavements, airports, and utilities. The journal delves into areas like artificial intelligence, cognitive modeling, concurrent engineering, database management, distributed computing, evolutionary computing, fuzzy logic, genetic algorithms, geometric modeling, internet-based technologies, knowledge discovery and engineering, machine learning, mobile computing, multimedia technologies, networking, neural network computing, optimization and search, parallel processing, robotics, smart structures, software engineering, virtual reality, and visualization techniques.