RC-SODet：小目标检测器的重参数化双卷积和紧凑特征增强

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-04-28 DOI:10.1016/j.imavis.2025.105552

Ze Wu , Zhongxu Li , Huan Lei , Hong Zhao , Wenyuan Yang

{"title":"RC-SODet：小目标检测器的重参数化双卷积和紧凑特征增强","authors":"Ze Wu , Zhongxu Li , Huan Lei , Hong Zhao , Wenyuan Yang","doi":"10.1016/j.imavis.2025.105552","DOIUrl":null,"url":null,"abstract":"<div><div>In the field of object detection, small object detection tasks have broad application prospects. However, detection models often face issues with insufficient image features for small objects and limited computational resources. To address these issues, we propose RC-SODet, a small object detector that uses reparameterization techniques combined with dual convolutions and compact feature enhancement blocks. In the detector, we design Reparameterized Dual Convolutions (RepDuConv) to replace conventional convolution and downsampling blocks. Its dual-branch advantage maintains accuracy, and the reparameterization technique built on this significantly improves inference efficiency. Compact Feature-enhanced Pyramid Network (RC-FPN) serves as the neck, using reparameterizable Cross Stage Partial with Feature Fusion Reparameterized Compact Blocks (C2fRCB) for feature enhancement. First, in the backbone network, RepDuConv replaces convolution blocks to perform downsampling on input images, thereby obtaining multi-scale features to pass to the neck. Second, the model uses RC-FPN as the feature pyramid neck to process multi-scale features from the backbone. After each front-end upsampling and fusion, dual-layer C2fRCB is applied to further refine and enhance the tensor features at different fusion scales. Finally, multi-level feature maps are fused at the back-end and passed to the detection head. Additionally, in the inference stage, both RepDuConv and C2fRCB optimize branch structures through reparameterization techniques. Experimental results show that on the small object datasets VisDrone and DroneVehicle, the highest version of RC-SODet achieves 48.1% and 82.4% mAP50, as well as 30.1% and 59.1% mAP50-95, respectively. The designed reparameterization technique increases the model inference speed by 58.1%.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105552"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RC-SODet: Reparameterized dual convolutions and compact feature enhancement for small object detector\",\"authors\":\"Ze Wu , Zhongxu Li , Huan Lei , Hong Zhao , Wenyuan Yang\",\"doi\":\"10.1016/j.imavis.2025.105552\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the field of object detection, small object detection tasks have broad application prospects. However, detection models often face issues with insufficient image features for small objects and limited computational resources. To address these issues, we propose RC-SODet, a small object detector that uses reparameterization techniques combined with dual convolutions and compact feature enhancement blocks. In the detector, we design Reparameterized Dual Convolutions (RepDuConv) to replace conventional convolution and downsampling blocks. Its dual-branch advantage maintains accuracy, and the reparameterization technique built on this significantly improves inference efficiency. Compact Feature-enhanced Pyramid Network (RC-FPN) serves as the neck, using reparameterizable Cross Stage Partial with Feature Fusion Reparameterized Compact Blocks (C2fRCB) for feature enhancement. First, in the backbone network, RepDuConv replaces convolution blocks to perform downsampling on input images, thereby obtaining multi-scale features to pass to the neck. Second, the model uses RC-FPN as the feature pyramid neck to process multi-scale features from the backbone. After each front-end upsampling and fusion, dual-layer C2fRCB is applied to further refine and enhance the tensor features at different fusion scales. Finally, multi-level feature maps are fused at the back-end and passed to the detection head. Additionally, in the inference stage, both RepDuConv and C2fRCB optimize branch structures through reparameterization techniques. Experimental results show that on the small object datasets VisDrone and DroneVehicle, the highest version of RC-SODet achieves 48.1% and 82.4% mAP50, as well as 30.1% and 59.1% mAP50-95, respectively. The designed reparameterization technique increases the model inference speed by 58.1%.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"159 \",\"pages\":\"Article 105552\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625001404\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001404","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在目标检测领域，小型目标检测任务具有广阔的应用前景。然而，检测模型经常面临小目标图像特征不足和计算资源有限的问题。为了解决这些问题，我们提出了RC-SODet，这是一种小型目标检测器，它使用重参数化技术，结合了双卷积和紧凑的特征增强块。在检测器中，我们设计了重参数化双卷积（RepDuConv）来取代传统的卷积和下采样块。它的双分支优势保持了精度，在此基础上建立的重参数化技术显著提高了推理效率。紧凑特征增强金字塔网络（RC-FPN）作为颈部，使用可重参数化交叉阶段部分与特征融合重参数化紧凑块（C2fRCB）进行特征增强。首先，在骨干网络中，RepDuConv替换卷积块对输入图像进行下采样，从而获得多尺度特征传递到颈部。其次，采用RC-FPN作为特征金字塔颈，对主干网的多尺度特征进行处理；在每次前端上采样和融合后，采用双层C2fRCB进一步细化和增强不同融合尺度下的张量特征。最后，在后端融合多层次特征映射并传递到检测头。此外，在推理阶段，RepDuConv和C2fRCB都通过重新参数化技术优化分支结构。实验结果表明，在小目标数据集VisDrone和dronvehicle上，RC-SODet最高版本的mAP50分别达到48.1%和82.4%，mAP50-95分别达到30.1%和59.1%。所设计的重参数化技术使模型推理速度提高了58.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

RC-SODet: Reparameterized dual convolutions and compact feature enhancement for small object detector

In the field of object detection, small object detection tasks have broad application prospects. However, detection models often face issues with insufficient image features for small objects and limited computational resources. To address these issues, we propose RC-SODet, a small object detector that uses reparameterization techniques combined with dual convolutions and compact feature enhancement blocks. In the detector, we design Reparameterized Dual Convolutions (RepDuConv) to replace conventional convolution and downsampling blocks. Its dual-branch advantage maintains accuracy, and the reparameterization technique built on this significantly improves inference efficiency. Compact Feature-enhanced Pyramid Network (RC-FPN) serves as the neck, using reparameterizable Cross Stage Partial with Feature Fusion Reparameterized Compact Blocks (C2fRCB) for feature enhancement. First, in the backbone network, RepDuConv replaces convolution blocks to perform downsampling on input images, thereby obtaining multi-scale features to pass to the neck. Second, the model uses RC-FPN as the feature pyramid neck to process multi-scale features from the backbone. After each front-end upsampling and fusion, dual-layer C2fRCB is applied to further refine and enhance the tensor features at different fusion scales. Finally, multi-level feature maps are fused at the back-end and passed to the detection head. Additionally, in the inference stage, both RepDuConv and C2fRCB optimize branch structures through reparameterization techniques. Experimental results show that on the small object datasets VisDrone and DroneVehicle, the highest version of RC-SODet achieves 48.1% and 82.4% mAP50, as well as 30.1% and 59.1% mAP50-95, respectively. The designed reparameterization technique increases the model inference speed by 58.1%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.