Li An , Pengbo Zhou , Mingquan Zhou , Yong Wang , Guohua Geng , Wuyang Shui , Wen Tang
{"title":"Spatial-Temporal Transformer for point cloud registration in digital modeling of complex environments","authors":"Li An , Pengbo Zhou , Mingquan Zhou , Yong Wang , Guohua Geng , Wuyang Shui , Wen Tang","doi":"10.1016/j.displa.2025.103139","DOIUrl":null,"url":null,"abstract":"<div><div>Building sustainable cities and societies requires precise spatial data to support high-accuracy digital modeling and environmental analysis. Terrestrial Laser Scanning (TLS) provides detailed 3D point cloud data, but these data are often segmented into multiple local datasets due to measurement range and environmental limitations, making point cloud registration a critical step for achieving comprehensive environmental representation. However, point cloud registration faces challenges in low-overlap, large-scale, and cross-dataset scenarios. To address these issues, this paper proposes a Spatial-Temporal Transformer-based point cloud registration method (TransPCR), designed specifically for multi-temporal data fusion in complex urban environments. The key innovation of this method is the use of dual-branch position encoding and a Spatial-Temporal Transformer for multi-level point cloud information interaction. The dual-branch position encoding combines local features and coordinates, enhancing the model’s ability to represent complex spatial structures and improving accuracy in low-overlap scenarios. The core Spatial-Temporal Transformer module further facilitates interaction between local positions and features, enabling the model to meet large-scale registration requirements. Additionally, the Temporal Transformer module achieves local-to-global fusion, promoting the learning and extraction of internal point cloud features. Tested on the 3DMatch and KITTI datasets and validated on WHU-TLS and ETH datasets, including complex scenes like urban areas, rivers, and forests. TransPCR demonstrates outstanding registration accuracy, indicating its potential in multi-source data integration and applications within complex environments.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"90 ","pages":"Article 103139"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225001763","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Building sustainable cities and societies requires precise spatial data to support high-accuracy digital modeling and environmental analysis. Terrestrial Laser Scanning (TLS) provides detailed 3D point cloud data, but these data are often segmented into multiple local datasets due to measurement range and environmental limitations, making point cloud registration a critical step for achieving comprehensive environmental representation. However, point cloud registration faces challenges in low-overlap, large-scale, and cross-dataset scenarios. To address these issues, this paper proposes a Spatial-Temporal Transformer-based point cloud registration method (TransPCR), designed specifically for multi-temporal data fusion in complex urban environments. The key innovation of this method is the use of dual-branch position encoding and a Spatial-Temporal Transformer for multi-level point cloud information interaction. The dual-branch position encoding combines local features and coordinates, enhancing the model’s ability to represent complex spatial structures and improving accuracy in low-overlap scenarios. The core Spatial-Temporal Transformer module further facilitates interaction between local positions and features, enabling the model to meet large-scale registration requirements. Additionally, the Temporal Transformer module achieves local-to-global fusion, promoting the learning and extraction of internal point cloud features. Tested on the 3DMatch and KITTI datasets and validated on WHU-TLS and ETH datasets, including complex scenes like urban areas, rivers, and forests. TransPCR demonstrates outstanding registration accuracy, indicating its potential in multi-source data integration and applications within complex environments.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.