Jinzheng Guang;Shichao Wu;Zhengxi Hu;Qianyi Zhang;Peng Wu;Jingtai Liu
{"title":"dcla:基于lidar的三维行人检测中线性关注的密集交叉连接","authors":"Jinzheng Guang;Shichao Wu;Zhengxi Hu;Qianyi Zhang;Peng Wu;Jingtai Liu","doi":"10.1109/TCSVT.2024.3515996","DOIUrl":null,"url":null,"abstract":"LiDAR-based 3D pedestrian detection has recently been extensively applied in autonomous driving and intelligent mobile robots. However, it remains a highly challenging perceptual task due to the sparsity of pedestrian point cloud data and the significant deformation of pedestrian body postures. To address these challenges, we propose a Dense Cross Connections network with Linear Attention (DCCLA), which mitigates the semantic discrepancy between the encoder and decoder of the network by integrating multiple 3D sparse convolutional layers within the skip connections. Furthermore, we enhance these connections by introducing cross-connections, thereby effectively promoting information interaction among various channels. To effectively retain crucial information while summarizing diverse pedestrian representations, we propose the Linear Self-Attention module for 3D point clouds (LSA3D), which significantly reduces model complexity. The experimental results demonstrate that our DCCLA achieves state-of-the-art Average Precision (AP) for the 3D pedestrian detection task on the JRDB large-scale dataset, outperforming the second-ranked method by 2.7% AP. Furthermore, our DCCLA enhances 1.6% mIoU over the benchmark method on the SemanticKITTI dataset. Therefore, our method achieves excellent performance through a cross-scale feature fusion strategy and linear attention that fully combines the advantages of convolution and transformer architectures. The project is publicly available at <uri>https://github.com/jinzhengguang/DCCLA</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4535-4548"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DCCLA: Dense Cross Connections With Linear Attention for LiDAR-Based 3D Pedestrian Detection\",\"authors\":\"Jinzheng Guang;Shichao Wu;Zhengxi Hu;Qianyi Zhang;Peng Wu;Jingtai Liu\",\"doi\":\"10.1109/TCSVT.2024.3515996\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"LiDAR-based 3D pedestrian detection has recently been extensively applied in autonomous driving and intelligent mobile robots. However, it remains a highly challenging perceptual task due to the sparsity of pedestrian point cloud data and the significant deformation of pedestrian body postures. To address these challenges, we propose a Dense Cross Connections network with Linear Attention (DCCLA), which mitigates the semantic discrepancy between the encoder and decoder of the network by integrating multiple 3D sparse convolutional layers within the skip connections. Furthermore, we enhance these connections by introducing cross-connections, thereby effectively promoting information interaction among various channels. To effectively retain crucial information while summarizing diverse pedestrian representations, we propose the Linear Self-Attention module for 3D point clouds (LSA3D), which significantly reduces model complexity. The experimental results demonstrate that our DCCLA achieves state-of-the-art Average Precision (AP) for the 3D pedestrian detection task on the JRDB large-scale dataset, outperforming the second-ranked method by 2.7% AP. Furthermore, our DCCLA enhances 1.6% mIoU over the benchmark method on the SemanticKITTI dataset. Therefore, our method achieves excellent performance through a cross-scale feature fusion strategy and linear attention that fully combines the advantages of convolution and transformer architectures. The project is publicly available at <uri>https://github.com/jinzhengguang/DCCLA</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4535-4548\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10795156/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10795156/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
DCCLA: Dense Cross Connections With Linear Attention for LiDAR-Based 3D Pedestrian Detection
LiDAR-based 3D pedestrian detection has recently been extensively applied in autonomous driving and intelligent mobile robots. However, it remains a highly challenging perceptual task due to the sparsity of pedestrian point cloud data and the significant deformation of pedestrian body postures. To address these challenges, we propose a Dense Cross Connections network with Linear Attention (DCCLA), which mitigates the semantic discrepancy between the encoder and decoder of the network by integrating multiple 3D sparse convolutional layers within the skip connections. Furthermore, we enhance these connections by introducing cross-connections, thereby effectively promoting information interaction among various channels. To effectively retain crucial information while summarizing diverse pedestrian representations, we propose the Linear Self-Attention module for 3D point clouds (LSA3D), which significantly reduces model complexity. The experimental results demonstrate that our DCCLA achieves state-of-the-art Average Precision (AP) for the 3D pedestrian detection task on the JRDB large-scale dataset, outperforming the second-ranked method by 2.7% AP. Furthermore, our DCCLA enhances 1.6% mIoU over the benchmark method on the SemanticKITTI dataset. Therefore, our method achieves excellent performance through a cross-scale feature fusion strategy and linear attention that fully combines the advantages of convolution and transformer architectures. The project is publicly available at https://github.com/jinzhengguang/DCCLA.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.