Redundant contextual feature suppression for pedestrian detection in dense scenes

IF 2.7 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication Pub Date : 2025-09-10 DOI:10.1016/j.image.2025.117403

Jun Wang, Lei Wan, Xin Zhang, Xiaotian Cao

{"title":"Redundant contextual feature suppression for pedestrian detection in dense scenes","authors":"Jun Wang, Lei Wan, Xin Zhang, Xiaotian Cao","doi":"10.1016/j.image.2025.117403","DOIUrl":null,"url":null,"abstract":"<div><div>Pedestrian detection is one of the important branches of object detection, with a wide range of applications in autonomous driving, intelligent video surveillance, and passenger flow statistics. However, these scenes exhibit high pedestrian density, severe occlusion, and complex redundant contextual information, leading to issues such as low detection accuracy and a high number of false positives in current general object detectors when applied in dense pedestrian scenes. In this paper, we propose an improved Context Suppressed R-CNN method for pedestrian detection in dense scenes, based on the Sparse R-CNN. Firstly, to further enhance the network’s ability to extract deep features in dense scenes, we introduce the CoT-FPN backbone by combining the FPN network with the Contextual Transformer Block. This block replaces the <span><math><mrow><mn>3</mn><mo>×</mo><mn>3</mn></mrow></math></span> convolution in the ResNet backbone. Secondly, addressing the issue that redundant contextual features of instance objects can mislead the localization and recognition of object detection tasks in dense scenes, we propose a Redundant Contextual Feature Suppression Module (RCFSM). This module, based on the convolutional block attention mechanism, aims to suppress redundant contextual information in instance features, thereby improving the network’s detection performance in dense scenes. The test results on the CrowdHuman dataset show that, compared with the Sparse R-CNN algorithm, the proposed algorithm improves the Average Precision (AP) by 1.1% and the Jaccard index by 1.2%, while also reducing the number of model parameters. Code is available at <span><span>https://github.com/davidsmithwj/CS-CS-RCNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117403"},"PeriodicalIF":2.7000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0923596525001493","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Pedestrian detection is one of the important branches of object detection, with a wide range of applications in autonomous driving, intelligent video surveillance, and passenger flow statistics. However, these scenes exhibit high pedestrian density, severe occlusion, and complex redundant contextual information, leading to issues such as low detection accuracy and a high number of false positives in current general object detectors when applied in dense pedestrian scenes. In this paper, we propose an improved Context Suppressed R-CNN method for pedestrian detection in dense scenes, based on the Sparse R-CNN. Firstly, to further enhance the network’s ability to extract deep features in dense scenes, we introduce the CoT-FPN backbone by combining the FPN network with the Contextual Transformer Block. This block replaces the

3 \times 3

convolution in the ResNet backbone. Secondly, addressing the issue that redundant contextual features of instance objects can mislead the localization and recognition of object detection tasks in dense scenes, we propose a Redundant Contextual Feature Suppression Module (RCFSM). This module, based on the convolutional block attention mechanism, aims to suppress redundant contextual information in instance features, thereby improving the network’s detection performance in dense scenes. The test results on the CrowdHuman dataset show that, compared with the Sparse R-CNN algorithm, the proposed algorithm improves the Average Precision (AP) by 1.1% and the Jaccard index by 1.2%, while also reducing the number of model parameters. Code is available at https://github.com/davidsmithwj/CS-CS-RCNN.

查看原文本刊更多论文

基于冗余上下文特征抑制的密集场景行人检测

行人检测是物体检测的重要分支之一，在自动驾驶、智能视频监控、客流统计等领域有着广泛的应用。然而，这些场景表现出高行人密度、严重遮挡和复杂的冗余上下文信息，导致当前通用目标检测器在应用于密集行人场景时检测精度低、误报率高等问题。在本文中，我们提出了一种改进的基于稀疏R-CNN的上下文抑制R-CNN方法，用于密集场景下的行人检测。首先，为了进一步增强网络在密集场景中提取深度特征的能力，我们将FPN网络与上下文转换块相结合，引入CoT-FPN骨干网。这个块取代了ResNet主干中的3×3卷积。其次，针对实例对象的冗余上下文特征会对密集场景中目标检测任务的定位和识别产生误导的问题，提出了冗余上下文特征抑制模块（RCFSM）。该模块基于卷积块注意机制，旨在抑制实例特征中冗余的上下文信息，从而提高网络在密集场景中的检测性能。在CrowdHuman数据集上的测试结果表明，与稀疏R-CNN算法相比，本文算法的平均精度（AP）提高了1.1%，Jaccard指数提高了1.2%，同时减少了模型参数的数量。代码可从https://github.com/davidsmithwj/CS-CS-RCNN获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Signal Processing-Image Communication 工程技术-工程：电子与电气

CiteScore

8.40

自引率

2.90%

发文量

138

审稿时长

5.2 months

期刊介绍： Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following: To present a forum for the advancement of theory and practice of image communication. To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems. To contribute to a rapid information exchange between the industrial and academic environments. The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world. Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments. Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.