3dst:用于激光雷达点云场景分割的3D可学习超级令牌转换器

IF 7.6 Q1 REMOTE SENSING
Dening Lu , Linlin Xu , Jun Zhou , Kyle (Yilin) Gao , Jonathan Li
{"title":"3dst:用于激光雷达点云场景分割的3D可学习超级令牌转换器","authors":"Dening Lu ,&nbsp;Linlin Xu ,&nbsp;Jun Zhou ,&nbsp;Kyle (Yilin) Gao ,&nbsp;Jonathan Li","doi":"10.1016/j.jag.2025.104572","DOIUrl":null,"url":null,"abstract":"<div><div>3D Transformers have achieved great success in point cloud understanding and representation. However, there is still considerable scope for further development in effective and efficient Transformers for large-scale LiDAR point cloud scene segmentation. This paper proposes a novel 3D Transformer framework, named <strong>3D L</strong>earnable <strong>S</strong>upertoken <strong>T</strong>ransformer (<strong>3DLST</strong>). The key contributions are summarized as follows. Firstly, we introduce the first Dynamic Supertoken Optimization (DSO) block for efficient token clustering and aggregating, where the learnable supertoken definition avoids the time-consuming pre-processing of traditional superpoint generation. Since the learnable supertokens can be dynamically optimized by multi-level deep features during network learning, they are tailored to the semantic homogeneity-aware token clustering. Secondly, an efficient Cross-Attention-guided Upsampling (CAU) block is proposed for token reconstruction from optimized supertokens. Thirdly, the 3DLST is equipped with a novel W-net architecture instead of the common U-net design, which is more suitable for Transformer-based feature learning. The SOTA performance on challenging LiDAR datasets (airborne MultiSpectral LiDAR (MS-LiDAR) (89.3% of the average <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> score) and DALES (80.2% of mIoU)) demonstrate the superiority of 3DLST. Furthermore, 3DLST also achieves satisfactory results in terms of algorithm efficiency, which is up to 5<span><math><mo>×</mo></math></span> faster than previous best-performing methods.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"140 ","pages":"Article 104572"},"PeriodicalIF":7.6000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"3DLST: 3D Learnable Supertoken Transformer for LiDAR point cloud scene segmentation\",\"authors\":\"Dening Lu ,&nbsp;Linlin Xu ,&nbsp;Jun Zhou ,&nbsp;Kyle (Yilin) Gao ,&nbsp;Jonathan Li\",\"doi\":\"10.1016/j.jag.2025.104572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>3D Transformers have achieved great success in point cloud understanding and representation. However, there is still considerable scope for further development in effective and efficient Transformers for large-scale LiDAR point cloud scene segmentation. This paper proposes a novel 3D Transformer framework, named <strong>3D L</strong>earnable <strong>S</strong>upertoken <strong>T</strong>ransformer (<strong>3DLST</strong>). The key contributions are summarized as follows. Firstly, we introduce the first Dynamic Supertoken Optimization (DSO) block for efficient token clustering and aggregating, where the learnable supertoken definition avoids the time-consuming pre-processing of traditional superpoint generation. Since the learnable supertokens can be dynamically optimized by multi-level deep features during network learning, they are tailored to the semantic homogeneity-aware token clustering. Secondly, an efficient Cross-Attention-guided Upsampling (CAU) block is proposed for token reconstruction from optimized supertokens. Thirdly, the 3DLST is equipped with a novel W-net architecture instead of the common U-net design, which is more suitable for Transformer-based feature learning. The SOTA performance on challenging LiDAR datasets (airborne MultiSpectral LiDAR (MS-LiDAR) (89.3% of the average <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> score) and DALES (80.2% of mIoU)) demonstrate the superiority of 3DLST. Furthermore, 3DLST also achieves satisfactory results in terms of algorithm efficiency, which is up to 5<span><math><mo>×</mo></math></span> faster than previous best-performing methods.</div></div>\",\"PeriodicalId\":73423,\"journal\":{\"name\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"volume\":\"140 \",\"pages\":\"Article 104572\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569843225002195\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225002195","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0

摘要

3D变形金刚在点云理解和表示方面取得了巨大的成功。然而,在大规模激光雷达点云场景分割中,有效、高效的变压器仍有相当大的发展空间。本文提出了一种新的3D变压器框架,称为3D可学习超级令牌变压器(3DLST)。主要贡献总结如下。首先,我们引入了第一个动态超级令牌优化(DSO)块,用于高效的令牌聚类和聚合,其中可学习的超级令牌定义避免了传统superpoint生成的耗时预处理。由于可学习的超级令牌可以在网络学习过程中通过多层深度特征动态优化,因此它们适合于语义同质感知的令牌聚类。其次,提出了一种高效的交叉注意引导上采样(CAU)块,用于从优化的超级令牌重构令牌。第三,3DLST采用了一种新颖的W-net架构,而不是普通的U-net设计,更适合基于transformer的特征学习。SOTA在具有挑战性的激光雷达数据集(机载多光谱激光雷达(MS-LiDAR)(平均F1分数的89.3%)和DALES(平均mIoU分数的80.2%)上的性能证明了3DLST的优越性。此外,3DLST在算法效率方面也取得了令人满意的结果,比以前性能最好的方法快了5倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
3DLST: 3D Learnable Supertoken Transformer for LiDAR point cloud scene segmentation
3D Transformers have achieved great success in point cloud understanding and representation. However, there is still considerable scope for further development in effective and efficient Transformers for large-scale LiDAR point cloud scene segmentation. This paper proposes a novel 3D Transformer framework, named 3D Learnable Supertoken Transformer (3DLST). The key contributions are summarized as follows. Firstly, we introduce the first Dynamic Supertoken Optimization (DSO) block for efficient token clustering and aggregating, where the learnable supertoken definition avoids the time-consuming pre-processing of traditional superpoint generation. Since the learnable supertokens can be dynamically optimized by multi-level deep features during network learning, they are tailored to the semantic homogeneity-aware token clustering. Secondly, an efficient Cross-Attention-guided Upsampling (CAU) block is proposed for token reconstruction from optimized supertokens. Thirdly, the 3DLST is equipped with a novel W-net architecture instead of the common U-net design, which is more suitable for Transformer-based feature learning. The SOTA performance on challenging LiDAR datasets (airborne MultiSpectral LiDAR (MS-LiDAR) (89.3% of the average F1 score) and DALES (80.2% of mIoU)) demonstrate the superiority of 3DLST. Furthermore, 3DLST also achieves satisfactory results in terms of algorithm efficiency, which is up to 5× faster than previous best-performing methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International journal of applied earth observation and geoinformation : ITC journal
International journal of applied earth observation and geoinformation : ITC journal Global and Planetary Change, Management, Monitoring, Policy and Law, Earth-Surface Processes, Computers in Earth Sciences
CiteScore
12.00
自引率
0.00%
发文量
0
审稿时长
77 days
期刊介绍: The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信