Dual Geometry Learning and Adaptive Sparse Attention for Point Cloud Analysis

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-21 DOI:10.1109/TCSVT.2025.3553537

Ce Zhou;Qiang Ling

{"title":"Dual Geometry Learning and Adaptive Sparse Attention for Point Cloud Analysis","authors":"Ce Zhou;Qiang Ling","doi":"10.1109/TCSVT.2025.3553537","DOIUrl":null,"url":null,"abstract":"Point cloud analysis is essential in accurately perceiving and analyzing real-world scenarios. Recently, transformer-based models have demonstrated great performance superiority in diverse domains. Nonetheless, directly applying transformers to point clouds is still challenging, primarily due to the computational intensity of transformers, which may significantly compromise their efficacy. Moreover, most methods typically rely on the relative 3D coordinates of point pairs to generate geometric information without fully exploiting the inherent local geometric properties. To tackle these challenges, we propose DGAS-Net, a novel architecture to enhance point cloud analysis. Specifically, we propose a Dual Geometry Learning (DGL) module to generate explicit geometric descriptors from triangular representations. These descriptors capture the local shape and geometric details of each point, serving as the foundation for deriving informative geometric features. Subsequently, we introduce a Dual Geometry Context Aggregation (DGCA) module to efficiently merge local geometric and semantic information. Furthermore, we design an Adaptive Sparse Attention (ASA) module to capture long-range information and expand the effective receptive field. ASA adaptively selects globally representative points and employs a novel vector attention mechanism for efficient global information fusion, thereby significantly reducing the computational complexity. Extensive experiments on four datasets demonstrate the superiority of DGAS-Net for various point cloud analysis tasks. The codes of DGAS-Net are available at <uri>https://github.com/zcustc-10/DGAS-Net</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9075-9089"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10937071/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Point cloud analysis is essential in accurately perceiving and analyzing real-world scenarios. Recently, transformer-based models have demonstrated great performance superiority in diverse domains. Nonetheless, directly applying transformers to point clouds is still challenging, primarily due to the computational intensity of transformers, which may significantly compromise their efficacy. Moreover, most methods typically rely on the relative 3D coordinates of point pairs to generate geometric information without fully exploiting the inherent local geometric properties. To tackle these challenges, we propose DGAS-Net, a novel architecture to enhance point cloud analysis. Specifically, we propose a Dual Geometry Learning (DGL) module to generate explicit geometric descriptors from triangular representations. These descriptors capture the local shape and geometric details of each point, serving as the foundation for deriving informative geometric features. Subsequently, we introduce a Dual Geometry Context Aggregation (DGCA) module to efficiently merge local geometric and semantic information. Furthermore, we design an Adaptive Sparse Attention (ASA) module to capture long-range information and expand the effective receptive field. ASA adaptively selects globally representative points and employs a novel vector attention mechanism for efficient global information fusion, thereby significantly reducing the computational complexity. Extensive experiments on four datasets demonstrate the superiority of DGAS-Net for various point cloud analysis tasks. The codes of DGAS-Net are available at https://github.com/zcustc-10/DGAS-Net

查看原文本刊更多论文

点云分析的对偶几何学习和自适应稀疏关注

点云分析对于准确地感知和分析现实世界的场景至关重要。近年来，基于变压器的模型在许多领域显示出了巨大的性能优势。尽管如此，直接将变压器应用于点云仍然具有挑战性，主要是由于变压器的计算强度，这可能会大大降低其效率。此外，大多数方法通常依赖于点对的相对三维坐标来生成几何信息，而没有充分利用固有的局部几何特性。为了解决这些问题，我们提出了一种新的DGAS-Net架构来增强点云分析。具体来说，我们提出了一个对偶几何学习（DGL）模块来从三角形表示生成显式几何描述符。这些描述符捕获每个点的局部形状和几何细节，作为导出信息几何特征的基础。随后，我们引入了双几何上下文聚合（DGCA）模块来有效地合并局部几何和语义信息。此外，我们设计了一个自适应稀疏注意（ASA）模块来捕获远程信息并扩展有效接受野。ASA自适应选择全局代表性点，采用新颖的向量关注机制进行高效的全局信息融合，显著降低了计算复杂度。在四个数据集上的大量实验证明了DGAS-Net在各种点云分析任务中的优越性。DGAS-Net的代码可在https://github.com/zcustc-10/DGAS-Net上获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.