dcla：基于lidar的三维行人检测中线性关注的密集交叉连接

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-12 DOI:10.1109/TCSVT.2024.3515996

Jinzheng Guang;Shichao Wu;Zhengxi Hu;Qianyi Zhang;Peng Wu;Jingtai Liu

{"title":"dcla：基于lidar的三维行人检测中线性关注的密集交叉连接","authors":"Jinzheng Guang;Shichao Wu;Zhengxi Hu;Qianyi Zhang;Peng Wu;Jingtai Liu","doi":"10.1109/TCSVT.2024.3515996","DOIUrl":null,"url":null,"abstract":"LiDAR-based 3D pedestrian detection has recently been extensively applied in autonomous driving and intelligent mobile robots. However, it remains a highly challenging perceptual task due to the sparsity of pedestrian point cloud data and the significant deformation of pedestrian body postures. To address these challenges, we propose a Dense Cross Connections network with Linear Attention (DCCLA), which mitigates the semantic discrepancy between the encoder and decoder of the network by integrating multiple 3D sparse convolutional layers within the skip connections. Furthermore, we enhance these connections by introducing cross-connections, thereby effectively promoting information interaction among various channels. To effectively retain crucial information while summarizing diverse pedestrian representations, we propose the Linear Self-Attention module for 3D point clouds (LSA3D), which significantly reduces model complexity. The experimental results demonstrate that our DCCLA achieves state-of-the-art Average Precision (AP) for the 3D pedestrian detection task on the JRDB large-scale dataset, outperforming the second-ranked method by 2.7% AP. Furthermore, our DCCLA enhances 1.6% mIoU over the benchmark method on the SemanticKITTI dataset. Therefore, our method achieves excellent performance through a cross-scale feature fusion strategy and linear attention that fully combines the advantages of convolution and transformer architectures. The project is publicly available at <uri>https://github.com/jinzhengguang/DCCLA</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4535-4548"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DCCLA: Dense Cross Connections With Linear Attention for LiDAR-Based 3D Pedestrian Detection\",\"authors\":\"Jinzheng Guang;Shichao Wu;Zhengxi Hu;Qianyi Zhang;Peng Wu;Jingtai Liu\",\"doi\":\"10.1109/TCSVT.2024.3515996\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"LiDAR-based 3D pedestrian detection has recently been extensively applied in autonomous driving and intelligent mobile robots. However, it remains a highly challenging perceptual task due to the sparsity of pedestrian point cloud data and the significant deformation of pedestrian body postures. To address these challenges, we propose a Dense Cross Connections network with Linear Attention (DCCLA), which mitigates the semantic discrepancy between the encoder and decoder of the network by integrating multiple 3D sparse convolutional layers within the skip connections. Furthermore, we enhance these connections by introducing cross-connections, thereby effectively promoting information interaction among various channels. To effectively retain crucial information while summarizing diverse pedestrian representations, we propose the Linear Self-Attention module for 3D point clouds (LSA3D), which significantly reduces model complexity. The experimental results demonstrate that our DCCLA achieves state-of-the-art Average Precision (AP) for the 3D pedestrian detection task on the JRDB large-scale dataset, outperforming the second-ranked method by 2.7% AP. Furthermore, our DCCLA enhances 1.6% mIoU over the benchmark method on the SemanticKITTI dataset. Therefore, our method achieves excellent performance through a cross-scale feature fusion strategy and linear attention that fully combines the advantages of convolution and transformer architectures. The project is publicly available at <uri>https://github.com/jinzhengguang/DCCLA</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4535-4548\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10795156/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10795156/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

近年来，基于激光雷达的三维行人检测在自动驾驶和智能移动机器人中得到了广泛的应用。然而，由于行人点云数据的稀疏性和行人身体姿势的显著变形，这仍然是一项极具挑战性的感知任务。为了解决这些挑战，我们提出了一个具有线性注意的密集交叉连接网络（DCCLA），该网络通过在跳过连接中集成多个3D稀疏卷积层来减轻网络编码器和解码器之间的语义差异。此外，我们通过引入交叉连接来加强这些联系，从而有效地促进各渠道之间的信息交互。为了在总结不同行人表征的同时有效地保留关键信息，我们提出了用于3D点云的线性自关注模块（LSA3D），该模块显著降低了模型的复杂性。实验结果表明，我们的DCCLA在JRDB大规模数据集上实现了最先进的3D行人检测任务的平均精度（AP），比排名第二的方法高出2.7% AP。此外，我们的DCCLA在SemanticKITTI数据集上比基准方法提高了1.6% mIoU。因此，我们的方法通过跨尺度特征融合策略和线性关注，充分结合了卷积和变压器架构的优势，实现了优异的性能。该项目可在https://github.com/jinzhengguang/DCCLA上公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DCCLA: Dense Cross Connections With Linear Attention for LiDAR-Based 3D Pedestrian Detection

LiDAR-based 3D pedestrian detection has recently been extensively applied in autonomous driving and intelligent mobile robots. However, it remains a highly challenging perceptual task due to the sparsity of pedestrian point cloud data and the significant deformation of pedestrian body postures. To address these challenges, we propose a Dense Cross Connections network with Linear Attention (DCCLA), which mitigates the semantic discrepancy between the encoder and decoder of the network by integrating multiple 3D sparse convolutional layers within the skip connections. Furthermore, we enhance these connections by introducing cross-connections, thereby effectively promoting information interaction among various channels. To effectively retain crucial information while summarizing diverse pedestrian representations, we propose the Linear Self-Attention module for 3D point clouds (LSA3D), which significantly reduces model complexity. The experimental results demonstrate that our DCCLA achieves state-of-the-art Average Precision (AP) for the 3D pedestrian detection task on the JRDB large-scale dataset, outperforming the second-ranked method by 2.7% AP. Furthermore, our DCCLA enhances 1.6% mIoU over the benchmark method on the SemanticKITTI dataset. Therefore, our method achieves excellent performance through a cross-scale feature fusion strategy and linear attention that fully combines the advantages of convolution and transformer architectures. The project is publicly available at https://github.com/jinzhengguang/DCCLA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.