Target-aware attentional network for rare class segmentation in large-scale LiDAR point clouds

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-02-01 DOI:10.1016/j.isprsjprs.2024.11.012

Xinlong Zhang , Dong Lin , Uwe Soergel

{"title":"Target-aware attentional network for rare class segmentation in large-scale LiDAR point clouds","authors":"Xinlong Zhang , Dong Lin , Uwe Soergel","doi":"10.1016/j.isprsjprs.2024.11.012","DOIUrl":null,"url":null,"abstract":"<div><div>Semantic interpretation of 3D scenes poses a formidable challenge in point cloud processing, which also stands as a requisite undertaking across various fields of application involving point clouds. Although a number of point cloud segmentation methods have achieved leading performance, 3D rare class segmentation continues to be a challenge owing to the imbalanced distribution of fine-grained classes and the complexity of large scenes. In this paper, we present target-aware attentional network (TaaNet), a novel mask-constrained attention framework to address 3D semantic segmentation of imbalanced classes in large-scale point clouds. Adapting the self-attention mechanism, a hierarchical aggregation strategy is first applied to enhance the learning of point-wise features across various scales, which leverages both global and local perspectives to guarantee presence of fine-grained patterns in the case of scenes with high complexity. Subsequently, rare target masks are imposed by a contextual module on the hierarchical features. Specifically, a target-aware aggregator is proposed to boost discriminative features of rare classes, which constrains hierarchical features with learnable adaptive weights and simultaneously embeds confidence constraints of rare classes. Furthermore, a target pseudo-labeling strategy based on strong contour cues of rare classes is designed, which effectively delivers instance-level supervisory signals restricted to rare targets only. We conducted thorough experiments on four multi-platform LiDAR benchmarks, i.e., airborne, mobile and terrestrial platforms, to assess the performance of our framework. Results demonstrate that compared to other commonly used advanced segmentation methods, our method can obtain not only high segmentation accuracy but also remarkable F1-scores in rare classes. In a submission to the official ranking page of Hessigheim 3D benchmark, our approach achieves a state-of-the-art mean F1-score of 83.84% and an outstanding overall accuracy (OA) of 90.45%. In particular, the F1-scores of rare classes namely vehicles and chimneys notably exceed the average of other published methods by a wide margin, boosting by 32.00% and 32.46%, respectively. Additionally, extensive experimental analysis on benchmarks collected from multiple platforms, Paris-Lille-3D, Semantic3D and WHU-Urban3D, validates the robustness and effectiveness of the proposed method.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 32-50"},"PeriodicalIF":10.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271624004222","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic interpretation of 3D scenes poses a formidable challenge in point cloud processing, which also stands as a requisite undertaking across various fields of application involving point clouds. Although a number of point cloud segmentation methods have achieved leading performance, 3D rare class segmentation continues to be a challenge owing to the imbalanced distribution of fine-grained classes and the complexity of large scenes. In this paper, we present target-aware attentional network (TaaNet), a novel mask-constrained attention framework to address 3D semantic segmentation of imbalanced classes in large-scale point clouds. Adapting the self-attention mechanism, a hierarchical aggregation strategy is first applied to enhance the learning of point-wise features across various scales, which leverages both global and local perspectives to guarantee presence of fine-grained patterns in the case of scenes with high complexity. Subsequently, rare target masks are imposed by a contextual module on the hierarchical features. Specifically, a target-aware aggregator is proposed to boost discriminative features of rare classes, which constrains hierarchical features with learnable adaptive weights and simultaneously embeds confidence constraints of rare classes. Furthermore, a target pseudo-labeling strategy based on strong contour cues of rare classes is designed, which effectively delivers instance-level supervisory signals restricted to rare targets only. We conducted thorough experiments on four multi-platform LiDAR benchmarks, i.e., airborne, mobile and terrestrial platforms, to assess the performance of our framework. Results demonstrate that compared to other commonly used advanced segmentation methods, our method can obtain not only high segmentation accuracy but also remarkable F1-scores in rare classes. In a submission to the official ranking page of Hessigheim 3D benchmark, our approach achieves a state-of-the-art mean F1-score of 83.84% and an outstanding overall accuracy (OA) of 90.45%. In particular, the F1-scores of rare classes namely vehicles and chimneys notably exceed the average of other published methods by a wide margin, boosting by 32.00% and 32.46%, respectively. Additionally, extensive experimental analysis on benchmarks collected from multiple platforms, Paris-Lille-3D, Semantic3D and WHU-Urban3D, validates the robustness and effectiveness of the proposed method.

查看原文本刊更多论文

大规模LiDAR点云中稀有类分割的目标感知关注网络

三维场景的语义解释对点云处理提出了巨大的挑战，这也是涉及点云的各种应用领域的必要工作。虽然一些点云分割方法已经取得了领先的性能，但由于细粒度类分布的不平衡和大场景的复杂性，3D稀有类分割仍然是一个挑战。针对大规模点云中不平衡类的三维语义分割问题，提出了一种新的基于掩码约束的注意力框架——目标感知注意力网络（TaaNet）。采用自关注机制，首先应用层次聚合策略增强了对不同尺度上逐点特征的学习，该策略利用全局和局部视角来保证在高复杂性场景下存在细粒度模式。随后，通过上下文模块对分层特征施加稀有目标掩码。具体而言，提出了一种目标感知聚合器来增强稀有类的判别特征，该聚合器约束具有可学习自适应权值的层次特征，同时嵌入稀有类的置信度约束。在此基础上，设计了一种基于稀有类强轮廓线索的目标伪标记策略，有效地传递了只局限于稀有目标的实例级监视信号。我们在四个多平台LiDAR基准上进行了全面的实验，即机载、移动和地面平台，以评估我们的框架的性能。结果表明，与其他常用的高级分割方法相比，我们的方法不仅可以获得较高的分割精度，而且在极少数类别中获得了显著的f1分数。在提交给hessighheim 3D基准的官方排名页面中，我们的方法达到了最先进的平均f1分数83.84%和出色的整体精度（OA） 90.45%。其中，车辆和烟囱这两个稀有类别的f1得分明显高于其他已公布方法的平均值，分别提高了32.00%和32.46%。此外，对从多个平台（Paris-Lille-3D、Semantic3D和WHU-Urban3D）收集的基准测试进行了广泛的实验分析，验证了该方法的鲁棒性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ISPRS Journal of Photogrammetry and Remote Sensing 工程技术-成像科学与照相技术

CiteScore

21.00

自引率

6.30%

发文量

273

审稿时长

40 days

期刊介绍： The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive. P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields. In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.