IEEE Transactions on Circuits and Systems for Video Technology最新文献

筛选
英文 中文
Polarity-Focused Denoising for Event Cameras 事件相机的极性聚焦去噪
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-17 DOI: 10.1109/TCSVT.2024.3519430
Chenyang Shi;Boyi Wei;Xiucheng Wang;Hanxiao Liu;Yibo Zhang;Wenzhuo Li;Ningfang Song;Jing Jin
{"title":"Polarity-Focused Denoising for Event Cameras","authors":"Chenyang Shi;Boyi Wei;Xiucheng Wang;Hanxiao Liu;Yibo Zhang;Wenzhuo Li;Ningfang Song;Jing Jin","doi":"10.1109/TCSVT.2024.3519430","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3519430","url":null,"abstract":"Event cameras, which are highly sensitive to light intensity changes, often generate substantial noise during imaging. Existing denoising methods either lack the speed for real-time processing or struggle with dynamic scenes, mistakenly discarding valid events. To address these issues, we propose a novel dual-stage polarity-focused denoising (PFD) method that leverages the consistency of polarity and its changes within local pixel areas. Whether due to camera motion or dynamic scene changes, the polarity and its changes in triggered events are highly correlated with these movements, allowing for effective noise handling. We introduce two versions: PFD-A, which excels at reducing background activity (BA) noise, and PFD-B, which is designed to address both BA and flicker noise. Both versions are lightweight and computationally efficient. The experimental results show that PFD outperforms benchmark methods in terms of the SNR and ESR metrics, achieving state-of-the-art performance across various datasets. Additionally, we propose an FPGA implementation of PFD processes that handles each event in just 7 clock cycles, ensuring real-time performance. The codes are available at <uri>https://github.com/shicy17/PFD</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4370-4383"},"PeriodicalIF":8.3,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Physically Stable Motion Generation: A New Paradigm of Human Pose Representation 走向物理稳定运动生成:人体姿态表征的新范式
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-16 DOI: 10.1109/TCSVT.2024.3518054
Qiongjie Cui;Zhenyu Lou;Zhenbo Song;Xiangbo Shu
{"title":"Toward Physically Stable Motion Generation: A New Paradigm of Human Pose Representation","authors":"Qiongjie Cui;Zhenyu Lou;Zhenbo Song;Xiangbo Shu","doi":"10.1109/TCSVT.2024.3518054","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3518054","url":null,"abstract":"In machine learning, generating realistic human motion is paramount for a range of applications that require lifelike movements. Traditional methods have often overlooked the adherence to physical principles, leading to motion sequences that exhibit unrealistic behaviors such as foot sliding, penetration, and floating. These issues are particularly pronounced in complex tasks like dance choreography, which demand a higher degree of fidelity and realism. To address these challenges, we introduce RF-Rotation, a novel approach to human pose representation that strategically repositions the root joint of the SMPL model to align with both feet, while representing other joints through recursive bone rotations. It not only aligns more closely with the natural dynamics of human movement but also integrates an advanced contact predictor to ascertain the ground contact status of both feet, thereby preventing physically implausible movements on feet. We note that RF-Rotation is compatible with any motion generation tasks, including dance choreography, text-to-motion synthesis, and motion prediction, and can be seamlessly integrated into existing frameworks without modifications. Extensive experiments across three distinct tasks demonstrate the superior performance of RF-Rotation in enhancing the realism and stability of generated motion sequences. This method can significantly reduce foot sliding, floating, and penetration issues, without affecting computational efficiency, underscores its potential to set new standards in human motion generation.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4158-4171"},"PeriodicalIF":8.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iESTA: Instance-Enhanced Spatial–Temporal Alignment for Video Copy Localization iESTA:实例增强的时空对齐视频副本定位
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-16 DOI: 10.1109/TCSVT.2024.3517664
Xinmiao Ding;Jinming Lou;Wenyang Luo;Yufan Liu;Bing Li;Weiming Hu
{"title":"iESTA: Instance-Enhanced Spatial–Temporal Alignment for Video Copy Localization","authors":"Xinmiao Ding;Jinming Lou;Wenyang Luo;Yufan Liu;Bing Li;Weiming Hu","doi":"10.1109/TCSVT.2024.3517664","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3517664","url":null,"abstract":"Video copy Segment Localization (VSL) requires the identification of the temporal segments within a pair of videos that contain copied content. Current methods primarily focus on global temporal modeling, overlooking the complementarity of global semantic and local fine-grained features, which limits their effectiveness. Some related methods attempt to incorporate local spatial information but often disrupt spatial semantic structures, resulting in less accurate matching. To address these issues, we propose the Instance-Enhanced Spatial-Temporal Alignment Framework (iESTA), based on a proper representation granularity that integrates instance-level local features and semantic global features. Specifically, the Instance-relation Graph (IRG) is constructed to capture instance-level features and fine-grained interactions, preserving local information integrity and better representing the video feature space in a proper granularity. An instance-GNN structure is designed to refine these graph representations. For global features, we enhance the representation of semantic information, capturing temporal relationships within videos using a Transformer framework. Additionally, we design a Complementarity-perception Alignment Module (CAM) to effectively process and integrate complementary spatial-temporal information, producing accurate frame-to-frame alignment maps. Our approach also incorporates a differentiable Dynamic Time Warping (DTW) method to utilize latent temporal alignments as weak supervisory signals, improving the accuracy of the matching process. Experimental results indicate that our proposed iESTA outperforms state-of-the-art methods on both the small-scale dataset VCDB and the large-scale dataset VCSL.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4409-4422"},"PeriodicalIF":8.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic Scene Completion via Semantic-Aware Guidance and Interactive Refinement Transformer 基于语义感知引导和交互细化转换器的语义场景补全
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-16 DOI: 10.1109/TCSVT.2024.3518493
Haihong Xiao;Wenxiong Kang;Hao Liu;Yuqiong Li;Ying He
{"title":"Semantic Scene Completion via Semantic-Aware Guidance and Interactive Refinement Transformer","authors":"Haihong Xiao;Wenxiong Kang;Hao Liu;Yuqiong Li;Ying He","doi":"10.1109/TCSVT.2024.3518493","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3518493","url":null,"abstract":"Predicting per-voxel occupancy status and corresponding semantic labels in 3D scenes is pivotal to 3D intelligent perception in autonomous driving. In this paper, we propose a novel semantic scene completion framework that can generate complete 3D volumetric semantics from a single image at a low cost. To the best of our knowledge, this is the first endeavor specifically aimed at mitigating the negative impacts of incorrect voxel query proposals caused by erroneous depth estimates and enhancing interactions for positive ones in camera-based semantic scene completion tasks. Specifically, we present a straightforward yet effective Semantic-aware Guided (SAG) module, which seamlessly integrates with task-related semantic priors to facilitate effective interactions between image features and voxel query proposals in a plug-and-play manner. Furthermore, we introduce a set of learnable object queries to better perceive objects within the scene. Building on this, we propose an Interactive Refinement Transformer (IRT) block, which iteratively updates voxel query proposals to enhance the perception of semantics and objects within the scene by leveraging the interaction between object queries and voxel queries through query-to-query cross-attention. Extensive experiments demonstrate that our method outperforms existing state-of-the-art approaches, achieving overall improvements of 0.30 and 2.74 in mIoU metric on the SemanticKITTI and SSCBench-KITTI-360 validation datasets, respectively, while also showing superior performance in the aspect of small object generation.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4212-4225"},"PeriodicalIF":8.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Frequency-Based Upsampling and Refining for HEVC Compressed Video Enhancement HEVC压缩视频增强中基于分层频率的上采样和细化
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-16 DOI: 10.1109/TCSVT.2024.3517840
Qianyu Zhang;Bolun Zheng;Xingying Chen;Quan Chen;Zunjie Zhu;Canjin Wang;Zongpeng Li;Xu Jia;Chengang Yan
{"title":"Hierarchical Frequency-Based Upsampling and Refining for HEVC Compressed Video Enhancement","authors":"Qianyu Zhang;Bolun Zheng;Xingying Chen;Quan Chen;Zunjie Zhu;Canjin Wang;Zongpeng Li;Xu Jia;Chengang Yan","doi":"10.1109/TCSVT.2024.3517840","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3517840","url":null,"abstract":"Video compression artifacts arise from quantization applied in the frequency domain. Video quality enhancement aims to reduce such compression artifacts and reconstruct a visually pleasant result. While existing methods effectively reduce artifacts in the spatial domain, they often overlook the rich frequency domain information, especially in addressing multi-scale compression artifacts. This work introduces a frequency-domain upsampling strategy within a multi-scale framework, specifically designed to focus on high-frequency details rather than simply blending neighboring pixels during the upsampling process. Our proposed hierarchical frequency-based upsampling and refinement neural network (HFUR) consists of two modules: implicit frequency upsampling (ImpFreqUp) and hierarchical and iterative refinement (HIR). ImpFreqUp exploits the DCT-domain prior derived through an implicit DCT transform, and accurately reconstructs the DCT-domain signal via a coarse-to-fine transfer. Additionally, HIR is introduced to facilitate cross-collaboration and information compensation between the scales, further refining the feature maps and promoting the visual quality of the final output. We demonstrate the effectiveness of the proposed modules via ablation experiments and visualized results. Experimental results demonstrate that HFUR outperforms the state-of-the-art methods up to 0.13dB/0.17dB on both constant bit rate and constant QP modes. The code is available at <uri>https://github.com/zqqqyu/HFUR</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4423-4436"},"PeriodicalIF":8.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Denser Teacher: Rethinking Dense Pseudo-Label for Semi-Supervised Oriented Object Detection 密集教师:重新思考面向半监督对象检测的密集伪标签
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-16 DOI: 10.1109/TCSVT.2024.3518452
Tong Zhao;Qiang Fang;Xin Xu
{"title":"Denser Teacher: Rethinking Dense Pseudo-Label for Semi-Supervised Oriented Object Detection","authors":"Tong Zhao;Qiang Fang;Xin Xu","doi":"10.1109/TCSVT.2024.3518452","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3518452","url":null,"abstract":"Oriented object detection, which aims to detect multi-oriented objects, is a fundamental task for visual analysis in complex scenarios, such as aerial images. However, powerful detection performance relies on abundant and accurate annotations. Therefore, semi-supervised oriented object detection, which utilizes unlabeled data to improve performance, is a promising method to address this problem. In this work, we explore Dense Pseudo-Label (DPL), which directly selects pseudo labels from the original output of the teacher model without any complicated post-processing steps, and expose the shortcomings of existing methods. Through analysis, we identify that the imbalance between obtaining potential positive samples and removing the interference of inaccurate pseudo labels hinders the effectiveness of DPL. To further improve DPL efficiency, we propose Denser Teacher, a new semi-supervised oriented object detection method. In this method, we design a simple yet effective adaptive mechanism called global dynamic k estimation to guide the selection of DPLs in densely-distributed scenes. Additionally, to improve scale adaptation, we introduce dense multi-scale learning for DPL, where DPLs from different scales are utilized to bridge the scale gap. We conduct extensive experiments on several benchmarks to demonstrate the effectiveness of our proposed method in leveraging unlabeled data for performance improvement. Our code will be available at <uri>https://github.com/Haru-zt/DenserTeacher</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4549-4559"},"PeriodicalIF":8.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10802941","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAWN: Domain-Adaptive Weakly Supervised Nuclei Segmentation via Cross-Task Interactions 跨任务交互的域自适应弱监督核分割
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-13 DOI: 10.1109/TCSVT.2024.3515467
Ye Zhang;Yifeng Wang;Zijie Fang;Hao Bian;Linghan Cai;Ziyue Wang;Yongbing Zhang
{"title":"DAWN: Domain-Adaptive Weakly Supervised Nuclei Segmentation via Cross-Task Interactions","authors":"Ye Zhang;Yifeng Wang;Zijie Fang;Hao Bian;Linghan Cai;Ziyue Wang;Yongbing Zhang","doi":"10.1109/TCSVT.2024.3515467","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3515467","url":null,"abstract":"Weakly supervised segmentation methods have garnered considerable attention due to their potential to alleviate the need for labor-intensive pixel-level annotations during model training. Traditional weakly supervised nuclei segmentation approaches typically involve a two-stage process: pseudo-label generation followed by network training. The performance of these methods is highly dependent on the quality of the generated pseudo-labels, which can limit their effectiveness. In this paper, we propose a novel domain-adaptive weakly supervised nuclei segmentation framework that addresses the challenge of pseudo-label generation through cross-task interaction strategies. Specifically, our approach leverages weakly annotated data to train an auxiliary detection task, which facilitates domain adaptation of the segmentation network. To improve the efficiency of domain adaptation, we introduce a consistent feature constraint module that integrates prior knowledge from the source domain. Additionally, we develop methods for pseudo-label optimization and interactive training to enhance domain transfer capabilities. We validate the effectiveness of our proposed method through extensive comparative and ablation experiments conducted on six datasets. The results demonstrate that our approach outperforms existing weakly supervised methods and achieves performance comparable to or exceeding that of fully supervised methods. Our code is available at <uri>https://github.com/zhangye-zoe/DAWN</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4753-4767"},"PeriodicalIF":8.3,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FLDet: Faster and Lighter Aerial Object Detector FLDet:更快更轻的空中目标探测器
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-13 DOI: 10.1109/TCSVT.2024.3516760
Shuyang Wang;Kang Liu;Ju Huang;Xuelong Li
{"title":"FLDet: Faster and Lighter Aerial Object Detector","authors":"Shuyang Wang;Kang Liu;Ju Huang;Xuelong Li","doi":"10.1109/TCSVT.2024.3516760","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3516760","url":null,"abstract":"In the rapidly evolving field of unmanned aerial vehicles (UAVs), real-time object detection is crucial for enhancing UAV intelligence. However, existing research often prioritizes complex networks to boost performance, neglecting the inherent computational resource constraints of UAVs. This paper presents FLDet, a family of faster and lighter detectors specifically designed for UAVs. By revisiting the architecture of modern lightweight detectors from a top-down perspective, FLDet offers a novel and comprehensive redesign of the head, neck, and backbone components. Firstly, we propose a Scale Sparse Head (SSH) that utilizes only two heads to detect objects of varying sizes, leveraging scale sparse feature pyramids to balance performance and efficiency. This design provides heuristic guidance for detector architecture development, offering a new paradigm for detector development. Secondly, a Partial Interaction Neck (PIN) is introduced to facilitate partial interaction between different feature scales, thereby reducing computational costs while effectively integrating multi-scale information. Thirdly, inspired by the primate visual pathway, a Stage-Wise Heterogeneous Network (SHN) is presented, employing heterogeneous blocks to capture both local details and contextual information. Finally, we develop a training strategy called Decay Data Augmentation (DDA) to enhance the detector’s generalization capability, leveraging diverse representations generated by strong data augmentation techniques. Experimental results on two challenging UAV-view detection benchmarks, VisDrone2019 and UAVDT, demonstrate that FLDet achieves a state-of-the-art balance among accuracy, latency, and parameter efficiency. In real scenarios tests, the fastest variant, FLDet-N, achieves real-time performance exceeding 52 FPS on an NVIDIA Jetson Xavier NX with only 1.2M parameters. The source code is available at <uri>https://github.com/wsy-yjys/FLDet</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4450-4463"},"PeriodicalIF":8.3,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
See Through Water: Heuristic Modeling Toward Color Correction for Underwater Image Enhancement 看透水:水下图像增强色彩校正的启发式建模
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-13 DOI: 10.1109/TCSVT.2024.3516781
Junyu Fan;Jie Xu;Jingchun Zhou;Danling Meng;Yi Lin
{"title":"See Through Water: Heuristic Modeling Toward Color Correction for Underwater Image Enhancement","authors":"Junyu Fan;Jie Xu;Jingchun Zhou;Danling Meng;Yi Lin","doi":"10.1109/TCSVT.2024.3516781","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3516781","url":null,"abstract":"Color cast is one of the main degradations in underwater images. Existing data-driven methods, while capable of learning color correction rules from large datasets, often overlook the imaging characteristics and light behavior in underwater environments, making them unable to accurately restore colors in complex water bodies. To address this, we use color constancy and an underwater imaging model to heuristically model the underwater environment for accurate color restoration. On one hand, we propose a multi-scale joint prior network architecture to fully explore the rich feature-level information at different scales in underwater images. This is used to fit the complex parameters of the underwater imaging model, deriving high-quality potential undegraded images. On the other hand, to tackle the challenges of color distortion caused by complex imaging factors in different water environments, we estimate the background light of the water body through the color constancy of underwater objects and dynamically incorporate it into the underwater imaging model as a prior. This not only guides the learning process more effectively but also allows the model to consider key aspects of underwater optical propagation, making it adaptable to different water environments and improving the color accuracy of the enhanced images. We have also conducted extensive experiments to demonstrate the effectiveness of the proposed method, which not only achieves the best overall performance in qualitative analysis and quantitative comparison but also boasts the best color accuracy and the fastest inference speed. The code is available at <uri>https://github.com/JunyuFan/MJPNet</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4039-4054"},"PeriodicalIF":8.3,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DCCLA: Dense Cross Connections With Linear Attention for LiDAR-Based 3D Pedestrian Detection dcla:基于lidar的三维行人检测中线性关注的密集交叉连接
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-12 DOI: 10.1109/TCSVT.2024.3515996
Jinzheng Guang;Shichao Wu;Zhengxi Hu;Qianyi Zhang;Peng Wu;Jingtai Liu
{"title":"DCCLA: Dense Cross Connections With Linear Attention for LiDAR-Based 3D Pedestrian Detection","authors":"Jinzheng Guang;Shichao Wu;Zhengxi Hu;Qianyi Zhang;Peng Wu;Jingtai Liu","doi":"10.1109/TCSVT.2024.3515996","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3515996","url":null,"abstract":"LiDAR-based 3D pedestrian detection has recently been extensively applied in autonomous driving and intelligent mobile robots. However, it remains a highly challenging perceptual task due to the sparsity of pedestrian point cloud data and the significant deformation of pedestrian body postures. To address these challenges, we propose a Dense Cross Connections network with Linear Attention (DCCLA), which mitigates the semantic discrepancy between the encoder and decoder of the network by integrating multiple 3D sparse convolutional layers within the skip connections. Furthermore, we enhance these connections by introducing cross-connections, thereby effectively promoting information interaction among various channels. To effectively retain crucial information while summarizing diverse pedestrian representations, we propose the Linear Self-Attention module for 3D point clouds (LSA3D), which significantly reduces model complexity. The experimental results demonstrate that our DCCLA achieves state-of-the-art Average Precision (AP) for the 3D pedestrian detection task on the JRDB large-scale dataset, outperforming the second-ranked method by 2.7% AP. Furthermore, our DCCLA enhances 1.6% mIoU over the benchmark method on the SemanticKITTI dataset. Therefore, our method achieves excellent performance through a cross-scale feature fusion strategy and linear attention that fully combines the advantages of convolution and transformer architectures. The project is publicly available at <uri>https://github.com/jinzhengguang/DCCLA</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4535-4548"},"PeriodicalIF":8.3,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信