Ze Wu , Zhongxu Li , Huan Lei , Hong Zhao , Wenyuan Yang
{"title":"RC-SODet:小目标检测器的重参数化双卷积和紧凑特征增强","authors":"Ze Wu , Zhongxu Li , Huan Lei , Hong Zhao , Wenyuan Yang","doi":"10.1016/j.imavis.2025.105552","DOIUrl":null,"url":null,"abstract":"<div><div>In the field of object detection, small object detection tasks have broad application prospects. However, detection models often face issues with insufficient image features for small objects and limited computational resources. To address these issues, we propose RC-SODet, a small object detector that uses reparameterization techniques combined with dual convolutions and compact feature enhancement blocks. In the detector, we design Reparameterized Dual Convolutions (RepDuConv) to replace conventional convolution and downsampling blocks. Its dual-branch advantage maintains accuracy, and the reparameterization technique built on this significantly improves inference efficiency. Compact Feature-enhanced Pyramid Network (RC-FPN) serves as the neck, using reparameterizable Cross Stage Partial with Feature Fusion Reparameterized Compact Blocks (C2fRCB) for feature enhancement. First, in the backbone network, RepDuConv replaces convolution blocks to perform downsampling on input images, thereby obtaining multi-scale features to pass to the neck. Second, the model uses RC-FPN as the feature pyramid neck to process multi-scale features from the backbone. After each front-end upsampling and fusion, dual-layer C2fRCB is applied to further refine and enhance the tensor features at different fusion scales. Finally, multi-level feature maps are fused at the back-end and passed to the detection head. Additionally, in the inference stage, both RepDuConv and C2fRCB optimize branch structures through reparameterization techniques. Experimental results show that on the small object datasets VisDrone and DroneVehicle, the highest version of RC-SODet achieves 48.1% and 82.4% mAP50, as well as 30.1% and 59.1% mAP50-95, respectively. The designed reparameterization technique increases the model inference speed by 58.1%.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105552"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RC-SODet: Reparameterized dual convolutions and compact feature enhancement for small object detector\",\"authors\":\"Ze Wu , Zhongxu Li , Huan Lei , Hong Zhao , Wenyuan Yang\",\"doi\":\"10.1016/j.imavis.2025.105552\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the field of object detection, small object detection tasks have broad application prospects. However, detection models often face issues with insufficient image features for small objects and limited computational resources. To address these issues, we propose RC-SODet, a small object detector that uses reparameterization techniques combined with dual convolutions and compact feature enhancement blocks. In the detector, we design Reparameterized Dual Convolutions (RepDuConv) to replace conventional convolution and downsampling blocks. Its dual-branch advantage maintains accuracy, and the reparameterization technique built on this significantly improves inference efficiency. Compact Feature-enhanced Pyramid Network (RC-FPN) serves as the neck, using reparameterizable Cross Stage Partial with Feature Fusion Reparameterized Compact Blocks (C2fRCB) for feature enhancement. First, in the backbone network, RepDuConv replaces convolution blocks to perform downsampling on input images, thereby obtaining multi-scale features to pass to the neck. Second, the model uses RC-FPN as the feature pyramid neck to process multi-scale features from the backbone. After each front-end upsampling and fusion, dual-layer C2fRCB is applied to further refine and enhance the tensor features at different fusion scales. Finally, multi-level feature maps are fused at the back-end and passed to the detection head. Additionally, in the inference stage, both RepDuConv and C2fRCB optimize branch structures through reparameterization techniques. Experimental results show that on the small object datasets VisDrone and DroneVehicle, the highest version of RC-SODet achieves 48.1% and 82.4% mAP50, as well as 30.1% and 59.1% mAP50-95, respectively. The designed reparameterization technique increases the model inference speed by 58.1%.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"159 \",\"pages\":\"Article 105552\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625001404\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001404","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
RC-SODet: Reparameterized dual convolutions and compact feature enhancement for small object detector
In the field of object detection, small object detection tasks have broad application prospects. However, detection models often face issues with insufficient image features for small objects and limited computational resources. To address these issues, we propose RC-SODet, a small object detector that uses reparameterization techniques combined with dual convolutions and compact feature enhancement blocks. In the detector, we design Reparameterized Dual Convolutions (RepDuConv) to replace conventional convolution and downsampling blocks. Its dual-branch advantage maintains accuracy, and the reparameterization technique built on this significantly improves inference efficiency. Compact Feature-enhanced Pyramid Network (RC-FPN) serves as the neck, using reparameterizable Cross Stage Partial with Feature Fusion Reparameterized Compact Blocks (C2fRCB) for feature enhancement. First, in the backbone network, RepDuConv replaces convolution blocks to perform downsampling on input images, thereby obtaining multi-scale features to pass to the neck. Second, the model uses RC-FPN as the feature pyramid neck to process multi-scale features from the backbone. After each front-end upsampling and fusion, dual-layer C2fRCB is applied to further refine and enhance the tensor features at different fusion scales. Finally, multi-level feature maps are fused at the back-end and passed to the detection head. Additionally, in the inference stage, both RepDuConv and C2fRCB optimize branch structures through reparameterization techniques. Experimental results show that on the small object datasets VisDrone and DroneVehicle, the highest version of RC-SODet achieves 48.1% and 82.4% mAP50, as well as 30.1% and 59.1% mAP50-95, respectively. The designed reparameterization technique increases the model inference speed by 58.1%.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.