Markus Ulrich , Carsten Steger , Florian Butsch , Maurice Liebe
{"title":"Vision-guided robot calibration using photogrammetric methods","authors":"Markus Ulrich , Carsten Steger , Florian Butsch , Maurice Liebe","doi":"10.1016/j.isprsjprs.2024.09.037","DOIUrl":"10.1016/j.isprsjprs.2024.09.037","url":null,"abstract":"<div><div>We propose novel photogrammetry-based robot calibration methods for industrial robots that are guided by cameras or 3D sensors. Compared to state-of-the-art methods, our methods are capable of calibrating the robot kinematics, the hand–eye transformations, and, for camera-guided robots, the interior orientation of the camera simultaneously. Our approach uses a minimal parameterization of the robot kinematics and hand–eye transformations. Furthermore, it uses a camera model that is capable of handling a large range of complex lens distortions that can occur in cameras that are typically used in machine vision applications. To determine the model parameters, geometrically meaningful photogrammetric error measures are used. They are independent of the parameterization of the model and typically result in a higher accuracy. We apply a stochastic model for all parameters (observations and unknowns), which allows us to assess the precision and significance of the calibrated model parameters. To evaluate our methods, we propose novel procedures that are relevant in real-world applications and do not require ground truth values. Experiments on synthetic and real data show that our approach improves the absolute positioning accuracy of industrial robots significantly. By applying our approach to two different uncalibrated UR3e robots, one guided by a camera and one by a 3D sensor, we were able to reduce the RMS evaluation error by approximately 85% for each robot.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 645-662"},"PeriodicalIF":10.6,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suya Lin , Zhixin Qi , Xia Li , Hui Zhang , Qianwen Lv , Di Huang
{"title":"A phenological-knowledge-independent method for automatic paddy rice mapping with time series of polarimetric SAR images","authors":"Suya Lin , Zhixin Qi , Xia Li , Hui Zhang , Qianwen Lv , Di Huang","doi":"10.1016/j.isprsjprs.2024.09.035","DOIUrl":"10.1016/j.isprsjprs.2024.09.035","url":null,"abstract":"<div><div>Paddy rice, which sustains more than half of the global population, requires accurate and efficient mapping to ensure food security. Synthetic aperture radar (SAR) has become indispensable in this process due to its remarkable ability to operate effectively in adverse weather conditions and its sensitivity to paddy rice growth. Phenological-knowledge-based (PKB) methods have been commonly employed in conjunction with time series of SAR images for paddy rice mapping, primarily because they eliminate the need for training datasets. However, PKB methods possess inherent limitations, primarily stemming from their reliance on precise phenological information regarding paddy rice growth. This information varies across regions and paddy rice varieties, making it challenging to use PKB methods effectively on a large spatial scale, such as the national or global scale, where collecting comprehensive phenological data becomes impractical. Moreover, variations in farming practices and field conditions can lead to differences in paddy rice growth stages even within the same region. Using a generalized set of phenological knowledge in PKB methods may not be suitable for all paddy fields, potentially resulting in errors in paddy rice extraction. To address the challenges posed by PKB methods, this study proposed an innovative approach known as the phenological-knowledge-independent (PKI) method for mapping paddy rice using time series of Sentinel-1 SAR images. The central innovation of the PKI method lies in its capability to map paddy rice without relying on specific knowledge of paddy rice phenology or the need for a training dataset. This was made possible by the incorporation of three novel metrics: VH and VV normalized maximum temporal changes (NMTC) and VH temporal mean, derived from the distinctions between paddy rice and other land cover types in time series of SAR images. The PKI method was rigorously evaluated across three regions in China, each featuring different paddy rice varieties. Additionally, the PKI method was compared with two prevalent phenological-knowledge-based techniques: the automated paddy rice mapping method using SAR flooding signals (ARM-SARFS) and the manual interpretation of unsupervised clustering results (MI-UCR). The PKI method achieved an average overall accuracy of 97.99%, surpassing the ARM-SARFS, which recorded an accuracy of 89.65% due to errors stemming from phenological disparities among different paddy fields. Furthermore, the PKI method delivered results on par with the MI-UCR, which relied on the fusion of SAR and optical image time series, achieving an accuracy of 97.71%. As demonstrated by these findings, the PKI method proves highly effective in mapping paddy rice across diverse regions, all without the need for phenological knowledge or a training dataset. Consequently, it holds substantial promise for efficiently mapping paddy rice on a large spatial scale. The source code used in this study is availa","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 628-644"},"PeriodicalIF":10.6,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variational Autoencoder with Gaussian Random Field prior: Application to unsupervised animal detection in aerial images","authors":"Hugo Gangloff , Minh-Tan Pham , Luc Courtrai , Sébastien Lefèvre","doi":"10.1016/j.isprsjprs.2024.09.028","DOIUrl":"10.1016/j.isprsjprs.2024.09.028","url":null,"abstract":"<div><div>In real world datasets of aerial images, the objects of interest are often missing, hard to annotate and of varying aspects. The framework of unsupervised Anomaly Detection (AD) is highly relevant in this context, and Variational Autoencoders (VAEs), a family of popular probabilistic models, are often used. We develop on the literature of VAEs for AD in order to take advantage of the particular textures that appear in natural aerial images. More precisely we propose a new VAE model with a Gaussian Random Field (GRF) prior (VAE-GRF), which generalizes the classical VAE model, and we provide the necessary procedures and hypotheses required for the model to be tractable. We show that, under some assumptions, the VAE-GRF largely outperforms the traditional VAE and some other probabilistic models developed for AD. Our results suggest that the VAE-GRF could be used as a relevant VAE baseline in place of the traditional VAE with very limited additional computational cost. We provide competitive results on the MVTec reference dataset for visual inspection, and two other datasets dedicated to the task of unsupervised animal detection in aerial images.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 600-609"},"PeriodicalIF":10.6,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yangzi Cong , Chi Chen , Bisheng Yang , Ruofei Zhong , Shangzhe Sun , Yuhang Xu , Zhengfei Yan , Xianghong Zou , Zhigang Tu
{"title":"OR-LIM: Observability-aware robust LiDAR-inertial-mapping under high dynamic sensor motion","authors":"Yangzi Cong , Chi Chen , Bisheng Yang , Ruofei Zhong , Shangzhe Sun , Yuhang Xu , Zhengfei Yan , Xianghong Zou , Zhigang Tu","doi":"10.1016/j.isprsjprs.2024.09.036","DOIUrl":"10.1016/j.isprsjprs.2024.09.036","url":null,"abstract":"<div><div>Light Detection And Ranging (LiDAR) technology has provided an impactful way to capture 3D data. However, consistent mapping in sensing-degenerated and perceptually-limited scenes (e.g. multi-story buildings) or under high dynamic sensor motion (e.g. rotating platform) remains a significant challenge. In this paper, we present OR-LIM, a novel observability-aware LiDAR-inertial-mapping system. Essentially, it combines a robust real-time LiDAR-inertial-odometry (LIO) module with an efficient surfel-map-smoothing (SMS) module that seamlessly optimizes the sensor poses and scene geometry at the same time. To improve robustness, the planar surfels are hierarchically generated and grown from point cloud maps to provide reliable correspondences for fixed-lag optimization. Moreover, the normals of surfels are analyzed for the observability evaluation of each frame. To maintain global consistency, a factor graph is utilized integrating the information from IMU propagation, LIO as well as the SMS. The system is extensively tested on the datasets collected by a low-cost multi-beam LiDAR (MBL) mounted on a rotating platform. The experiments with various settings of sensor motion, conducted on complex multi-story buildings and large-scale outdoor scenes, demonstrate the superior performance of our system over multiple state-of-the-art methods. The improvement of point accuracy reaches 3.39–13.6 % with an average 8.71 % outdoor and correspondingly 1.89–15.88 % with 9.09 % indoor, with reference to the collected Terrestrial Laser Scanning (TLS) map.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 610-627"},"PeriodicalIF":10.6,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Zhang , Zhaoru Zhang , Jianfeng He , Walker O. Smith , Na Liu , Chengfeng Le
{"title":"Re-evaluating winter carbon sink in Southern Ocean by recovering MODIS-Aqua chlorophyll-a product at high solar zenith angles","authors":"Ke Zhang , Zhaoru Zhang , Jianfeng He , Walker O. Smith , Na Liu , Chengfeng Le","doi":"10.1016/j.isprsjprs.2024.09.033","DOIUrl":"10.1016/j.isprsjprs.2024.09.033","url":null,"abstract":"<div><div>Satellite ocean color observations are extensively utilized in global carbon sink evaluation. However, the valid coverage of chlorophyll-a concentration (Chla, mg m<sup>−3</sup>) measurements from these observations is severely limited during autumn and winter in high latitude oceans. The high solar zenith angle (SZA) stands as one of the primary contributors to the reduced quality of Chla products in the high-latitude Southern Ocean during these seasons. This study addresses this challenge by employing a random forest-based regression ensemble (RFRE) method to enhance the quality of Moderate Resolution Imaging Spectroradiometer (MODIS) Chla products affected by high SZA conditions. The RFRE model incorporates the color index (CI), band-ratio index (R), SZA, sensor zenith angle (senz), and Rayleigh-corrected reflectance at 869 nm (Rrc(869)) as predictors. The results indicate that the RFRE model significantly increased the MODIS observed Chla coverage (1.03 to 3.24 times) in high-latitude Southern Ocean regions to the quality of standard Chla products. By applying the recovered Chla to re-evaluate the carbon sink in South Ocean, results showed that the Southern Ocean’s ability to absorb carbon dioxide (CO<sub>2</sub>) in winter has been underestimated (5.9–18.6 Tg C year<sup>−1</sup>) in previous assessments. This study underscores the significance of improving the Chla products for a more accurate estimation of winter carbon sink in the Southern Ocean.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 588-599"},"PeriodicalIF":10.6,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangtian Meng , Yilin Bao , Chong Luo , Xinle Zhang , Huanjun Liu
{"title":"A new methodology for establishing an SOC content prediction model that is spatiotemporally transferable at multidecadal and intercontinental scales","authors":"Xiangtian Meng , Yilin Bao , Chong Luo , Xinle Zhang , Huanjun Liu","doi":"10.1016/j.isprsjprs.2024.09.038","DOIUrl":"10.1016/j.isprsjprs.2024.09.038","url":null,"abstract":"<div><div>Quantifying and tracking the soil organic carbon (SOC) content is a key step toward long-term terrestrial ecosystem monitoring. Over the past decade, numerous models have been proposed and have achieved promising results for predicting SOC content. However, many of these studies are confined to specific temporal or spatial contexts, neglecting model transferability. Temporal transferability refers to a model’s ability to be applied across different periods, while spatial transferability relates to its applicability across diverse geographic locations for prediction. Therefore, developing a new methodology to establish a prediction model with high spatiotemporal transferability for SOC content is critically important. In this study, two large intercontinental study areas were selected, and measured topsoil (0–20 cm) sample data, 27,059 cloudless Landsat 5/8 images, digital elevation models, and climate data were acquired for 3 periods. Based on these data, monthly average climate data, monthly average data reflecting soil properties, and topography data were calculated as original input (OI) variables. We established an innovative multivariate deep learning model with high spatiotemporal transferability, combining the advantages of attention mechanism, graph neural network, and long short-term memory network model (A-GNN-LSTM). Additionally, the spatiotemporal transferability of A-GNN-LSTM and commonly used prediction models were compared. Finally, the abilities of the OI variables and the OI variables processed by feature engineering (FEI) for different SOC prediction models were explored. The results show that 1) the A-GNN-LSTM that used OI as the input variable was the optimal prediction model (RMSE = 4.86 g kg<sup>−1</sup>, R<sup>2</sup> = 0.81, RPIQ = 2.46, and MAE = 3.78 g kg<sup>−1</sup>) with the highest spatiotemporal transferability. 2) Compared to the temporal transferability of the GNN, the A-GNN-LSTM demonstrates superior temporal transferability (ΔR<sup>2</sup><sub>T</sub> = −0.10 vs. −0.07). Furthermore, compared to the spatial transferability of LSTM, the A-GNN-LSTM shows enhanced spatial transferability (ΔR<sup>2</sup><sub>S</sub> = −0.16 vs. −0.09). These findings strongly suggest that the fusion of geospatial context and temporally dependent information, extracted through the integration of GNN and LSTM models, effectively enhances the spatiotemporal transferability of the models. 3) By introducing the attention mechanism, the weights of different input variables could be calculated, increasing the physical interpretability of the deep learning model. The largest weight was assigned to climate data (39.55 %), and the smallest weight was assigned to vegetation (19.96 %). 4) Among the commonly used prediction models, the deep learning model had higher prediction accuracy (RMSE = 6.64 g kg<sup>−1</sup>, R<sup>2</sup> = 0.64, RPIQ = 1.78, and MAE = 4.78 g kg<sup>−1</sup>) and spatial transferability (ΔRMSE<sub>S</sub> = 1.","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 531-550"},"PeriodicalIF":10.6,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renlian Zhou , Monjee K. Almustafa , Moncef L. Nehdi , Huaizhi Su
{"title":"Automated localization of dike leakage outlets using UAV-borne thermography and YOLO-based object detectors","authors":"Renlian Zhou , Monjee K. Almustafa , Moncef L. Nehdi , Huaizhi Su","doi":"10.1016/j.isprsjprs.2024.09.039","DOIUrl":"10.1016/j.isprsjprs.2024.09.039","url":null,"abstract":"<div><div>Leakage-induced soil erosion poses a major threat to dike failure, particularly during floods. Timely detection and notification of leakage outlets to dike management are crucial for ensuring dike safety. However, manual inspection, the current main approach for identifying leakage outlets, is costly, inefficient, and lacks spatial coverage. To achieve efficient and automatic localization of dike leakage outlets, an innovative strategy combining drones, infrared thermography, and deep learning is presented. Drones are employed for dikes’ surface sensing. Real-time images from these drones are sent to a server where well-trained detectors are deployed. Once a leakage outlet is detected, alarming information is remotely sent to dike managers. To realize this strategy, 4 thermal imagers were employed to image leaking outlets of several models and actual dikes. 9,231 hand-labeled thermal images with 13,387 leaking objects were selected for analysis. 19 detectors were trained using transfer learning. The best detector achieved a mean average precision of 95.8 % on the challenging test set. A full-scale embankment was constructed for leakage outlet detection tests. Various field tests confirmed the efficiency of the proposed leakage outlet localization method. In some tough conditions, the trained detector also evidently outperformed manual judgement. Results indicate that under typical circumstances, the localization error of the proposed method is within 5 m, demonstrating its practical reliability. Finally, the influencing factors and limits of the suggested strategy are thoroughly examined.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 551-573"},"PeriodicalIF":10.6,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pan Zhang , Baochai Peng , Chaoran Lu , Quanjin Huang , Dongsheng Liu
{"title":"ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification","authors":"Pan Zhang , Baochai Peng , Chaoran Lu , Quanjin Huang , Dongsheng Liu","doi":"10.1016/j.isprsjprs.2024.09.025","DOIUrl":"10.1016/j.isprsjprs.2024.09.025","url":null,"abstract":"<div><div>Synthetic Aperture Radar (SAR) images have proven to be a valuable cue for multimodal Land Cover Classification (LCC) when combined with RGB images. Most existing studies on cross-modal fusion assume that consistent feature information is necessary between the two modalities, and as a result, they construct networks without adequately addressing the unique characteristics of each modality. In this paper, we propose a novel architecture, named the Asymmetric Semantic Aligning Network (ASANet), which introduces asymmetry at the feature level to address the issue that multi-modal architectures frequently fail to fully utilize complementary features. The core of this network is the Semantic Focusing Module (SFM), which explicitly calculates differential weights for each modality to account for the modality-specific features. Furthermore, ASANet incorporates a Cascade Fusion Module (CFM), which delves deeper into channel and spatial representations to efficiently select features from the two modalities for fusion. Through the collaborative effort of these two modules, the proposed ASANet effectively learns feature correlations between the two modalities and eliminates noise caused by feature differences. Comprehensive experiments demonstrate that ASANet achieves excellent performance on three multimodal datasets. Additionally, we have established a new RGB-SAR multimodal dataset, on which our ASANet outperforms other mainstream methods with improvements ranging from 1.21% to 17.69%. The ASANet runs at 48.7 frames per second (FPS) when the input image is 256 × 256 pixels.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 574-587"},"PeriodicalIF":10.6,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gengxuan Tian , Junqiao Zhao , Yingfeng Cai , Fenglin Zhang , Xufei Wang , Chen Ye , Sisi Zlatanova , Tiantian Feng
{"title":"VNI-Net: Vector neurons-based rotation-invariant descriptor for LiDAR place recognition","authors":"Gengxuan Tian , Junqiao Zhao , Yingfeng Cai , Fenglin Zhang , Xufei Wang , Chen Ye , Sisi Zlatanova , Tiantian Feng","doi":"10.1016/j.isprsjprs.2024.09.011","DOIUrl":"10.1016/j.isprsjprs.2024.09.011","url":null,"abstract":"<div><div>Despite the emergence of various LiDAR-based place recognition methods, the challenge of place recognition failure due to rotation remains critical. Existing studies have attempted to address this limitation through specific training strategies involving data augment and rotation-invariant networks. However, augmenting 3D rotations (<span><math><mrow><mi>SO</mi><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow></math></span>) is impractical for the former, while the latter primarily focuses on the reduced problem of 2D rotation (<span><math><mrow><mi>SO</mi><mrow><mo>(</mo><mn>2</mn><mo>)</mo></mrow></mrow></math></span>) invariance. Existing methods targeting <span><math><mrow><mi>SO</mi><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow></math></span> rotation invariance suffer from limitations in discriminative capability. In this paper, we propose a novel approach (VNI-Net) based on the Vector Neurons Network (VNN) to achieve <span><math><mrow><mi>SO</mi><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow></math></span> rotation invariance. Our method begins by extracting rotation-equivariant features from neighboring points and projecting these low-dimensional features into a high-dimensional space using VNN. We then compute both Euclidean and cosine distances in the rotation-equivariant feature space to obtain rotation-invariant features. Finally, we aggregate these features using generalized-mean (GeM) pooling to generate the global descriptor. To mitigate the significant information loss associated with formulating rotation-invariant features, we propose computing distances between features at different layers within the Euclidean space neighborhood. This approach significantly enhances the discriminability of the descriptors while maintaining computational efficiency. We conduct experiments across multiple publicly available datasets captured with vehicle-mounted, drone-mounted LiDAR sensors and handheld. VNI-Net outperforms baseline methods by up to 15.3% in datasets with rotation, while achieving comparable results with state-of-the-art place recognition methods in datasets with less rotation. Our code is open-sourced at <span><span>https://github.com/tiev-tongji/VNI-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 506-517"},"PeriodicalIF":10.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142357945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Li , Qingqing Li , Guozheng Xu , Pengwei Zhou , Jingmin Tu , Jie Li , Mingming Li , Jian Yao
{"title":"A boundary-aware point clustering approach in Euclidean and embedding spaces for roof plane segmentation","authors":"Li Li , Qingqing Li , Guozheng Xu , Pengwei Zhou , Jingmin Tu , Jie Li , Mingming Li , Jian Yao","doi":"10.1016/j.isprsjprs.2024.09.030","DOIUrl":"10.1016/j.isprsjprs.2024.09.030","url":null,"abstract":"<div><div>Roof plane segmentation from airborne light detection and ranging (LiDAR) point clouds is an important technology for three-dimensional (3D) building model reconstruction. One of the key issues of plane segmentation is how to design powerful features that can exactly distinguish adjacent planar patches. The quality of point feature directly determines the accuracy of roof plane segmentation. Most of existing approaches use handcrafted features, such as point-to-plane distance, normal vector, etc., to extract roof planes. However, the abilities of these features are relatively low, especially in boundary areas. To solve this problem, we propose a boundary-aware point clustering approach in Euclidean and embedding spaces constructed by a multi-task deep network for roof plane segmentation. We design a three-branch multi-task network to predict semantic labels, point offsets and extract deep embedding features. In the first branch, we classify the input data as non-roof, boundary and plane points. In the second branch, we predict point offsets for shifting each point towards its respective instance center. In the third branch, we constrain that points of the same plane instance should have the similar embeddings. We aim to ensure that points of the same plane instance are close as much as possible in both Euclidean and embedding spaces. However, although deep network has strong feature representative ability, it is still hard to accurately distinguish points near the plane instance boundary. Therefore, we first robustly group plane points into many clusters in Euclidean and embedding spaces to find candidate planes. Then, we assign the rest boundary points to their closest clusters to generate the final complete roof planes. In this way, we can effectively reduce the influence of unreliable boundary points. In addition, to train the network and evaluate the performance of our approach, we prepare a synthetic dataset and two real datasets. The experiments conducted on synthetic and real datasets show that the proposed approach significantly outperforms the existing state-of-the-art approaches in both qualitative evaluation and quantitative metrics. To facilitate future research, we will make datasets and source code of our approach publicly available at <span><span>https://github.com/Li-Li-Whu/DeepRoofPlane</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 518-530"},"PeriodicalIF":10.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142357926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}