{"title":"CSPPNet: Cascade space pyramid pooling network for object detection","authors":"Yafeng Liu, Yongsheng Dong","doi":"10.1016/j.cviu.2025.104377","DOIUrl":null,"url":null,"abstract":"<div><div>Real-time object detection, as an important research direction in the field of computer vision, aims to achieve fast and accurate object detection. However, many current methods fail to achieve a balance between speed, parameters, and accuracy. To alleviate this problem, in this paper, we construct a novel cascade spatial pyramid pooling network (CSPPNet) for object detection. In particular, we first propose a cascade feature fusion (CFF) module, which combines the novel cascade cross-layer structure and GSConv convolution to lighten the existing necking structure and improve the detection accuracy of the model without adding a large number of parameters. In addition, in order to alleviate the loss of feature detail information due to max pooling, we further propose the nest space pooling (NSP) module, which combines nest feature fusion with max pooling operations to improve the fusion performance of local feature information with global feature information. Experimental results show that our CSPPNet is competitive, achieving 43.1% AP on the MS-COCO 2017 test-dev dataset.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"258 ","pages":"Article 104377"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001006","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Real-time object detection, as an important research direction in the field of computer vision, aims to achieve fast and accurate object detection. However, many current methods fail to achieve a balance between speed, parameters, and accuracy. To alleviate this problem, in this paper, we construct a novel cascade spatial pyramid pooling network (CSPPNet) for object detection. In particular, we first propose a cascade feature fusion (CFF) module, which combines the novel cascade cross-layer structure and GSConv convolution to lighten the existing necking structure and improve the detection accuracy of the model without adding a large number of parameters. In addition, in order to alleviate the loss of feature detail information due to max pooling, we further propose the nest space pooling (NSP) module, which combines nest feature fusion with max pooling operations to improve the fusion performance of local feature information with global feature information. Experimental results show that our CSPPNet is competitive, achieving 43.1% AP on the MS-COCO 2017 test-dev dataset.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems