{"title":"Dual Geometry Learning and Adaptive Sparse Attention for Point Cloud Analysis","authors":"Ce Zhou;Qiang Ling","doi":"10.1109/TCSVT.2025.3553537","DOIUrl":null,"url":null,"abstract":"Point cloud analysis is essential in accurately perceiving and analyzing real-world scenarios. Recently, transformer-based models have demonstrated great performance superiority in diverse domains. Nonetheless, directly applying transformers to point clouds is still challenging, primarily due to the computational intensity of transformers, which may significantly compromise their efficacy. Moreover, most methods typically rely on the relative 3D coordinates of point pairs to generate geometric information without fully exploiting the inherent local geometric properties. To tackle these challenges, we propose DGAS-Net, a novel architecture to enhance point cloud analysis. Specifically, we propose a Dual Geometry Learning (DGL) module to generate explicit geometric descriptors from triangular representations. These descriptors capture the local shape and geometric details of each point, serving as the foundation for deriving informative geometric features. Subsequently, we introduce a Dual Geometry Context Aggregation (DGCA) module to efficiently merge local geometric and semantic information. Furthermore, we design an Adaptive Sparse Attention (ASA) module to capture long-range information and expand the effective receptive field. ASA adaptively selects globally representative points and employs a novel vector attention mechanism for efficient global information fusion, thereby significantly reducing the computational complexity. Extensive experiments on four datasets demonstrate the superiority of DGAS-Net for various point cloud analysis tasks. The codes of DGAS-Net are available at <uri>https://github.com/zcustc-10/DGAS-Net</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9075-9089"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10937071/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Point cloud analysis is essential in accurately perceiving and analyzing real-world scenarios. Recently, transformer-based models have demonstrated great performance superiority in diverse domains. Nonetheless, directly applying transformers to point clouds is still challenging, primarily due to the computational intensity of transformers, which may significantly compromise their efficacy. Moreover, most methods typically rely on the relative 3D coordinates of point pairs to generate geometric information without fully exploiting the inherent local geometric properties. To tackle these challenges, we propose DGAS-Net, a novel architecture to enhance point cloud analysis. Specifically, we propose a Dual Geometry Learning (DGL) module to generate explicit geometric descriptors from triangular representations. These descriptors capture the local shape and geometric details of each point, serving as the foundation for deriving informative geometric features. Subsequently, we introduce a Dual Geometry Context Aggregation (DGCA) module to efficiently merge local geometric and semantic information. Furthermore, we design an Adaptive Sparse Attention (ASA) module to capture long-range information and expand the effective receptive field. ASA adaptively selects globally representative points and employs a novel vector attention mechanism for efficient global information fusion, thereby significantly reducing the computational complexity. Extensive experiments on four datasets demonstrate the superiority of DGAS-Net for various point cloud analysis tasks. The codes of DGAS-Net are available at https://github.com/zcustc-10/DGAS-Net
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.