{"title":"A Novel Dense Object Detector With Scale Balanced Sample Assignment and Refinement","authors":"Jinpeng Dong;Dingyi Yao;Yufeng Hu;Sanping Zhou;Nanning Zheng","doi":"10.1109/TCSVT.2025.3551912","DOIUrl":null,"url":null,"abstract":"Scale variation of objects remains one of the crucial challenges in object detection. Currently, conventional dense detectors with fixed receptive fields and label weights are not conducive to the detection of multi-scale objects. However, the design limitations of unbalanced label weights and fixed refinement for multi-scale objects and multi-tasks in these studies make it difficult to achieve better detection performance. In this paper, we propose a novel dense detector named Balanced FCOS which consists of two components: Balanced Label Assignment (BLA) and Flexible Shape-based Refinement (FSR). The BLA implements scale-balanced sample assignment by introducing reweighting factors consisting of localization and classification scores into the label assignment. Low-quality but high-weight samples can be weakened by the BLA. Furthermore, we design a cross-reweighting mechanism in the BLA to ensure score consistency between classification and localization. The FSR implements scale-balanced sample refinement by learning flexible sample points’ offsets for multi-scale objects and multi-tasks based on objects’ coarse features to get more discriminative features with appropriate receptive field. In addition, better features obtained by FSR are beneficial to get better classification and localization scores, which can be used by BLA to produce accurate label weights. Only equipped with the BLA, we can achieve 41.7/46.6 AP under R50/R101-FCOS without any additional parameters. When combining the BLA with the FSR, our Balanced FCOS achieves SOTA results among dense detectors on the COCO test-dev set. Experiments conducted on other heads (T-Head, DyHead), detectors (DINO), and datasets (AI-TOD) further demonstrate the effectiveness of our method.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9337-9350"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10929026/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Scale variation of objects remains one of the crucial challenges in object detection. Currently, conventional dense detectors with fixed receptive fields and label weights are not conducive to the detection of multi-scale objects. However, the design limitations of unbalanced label weights and fixed refinement for multi-scale objects and multi-tasks in these studies make it difficult to achieve better detection performance. In this paper, we propose a novel dense detector named Balanced FCOS which consists of two components: Balanced Label Assignment (BLA) and Flexible Shape-based Refinement (FSR). The BLA implements scale-balanced sample assignment by introducing reweighting factors consisting of localization and classification scores into the label assignment. Low-quality but high-weight samples can be weakened by the BLA. Furthermore, we design a cross-reweighting mechanism in the BLA to ensure score consistency between classification and localization. The FSR implements scale-balanced sample refinement by learning flexible sample points’ offsets for multi-scale objects and multi-tasks based on objects’ coarse features to get more discriminative features with appropriate receptive field. In addition, better features obtained by FSR are beneficial to get better classification and localization scores, which can be used by BLA to produce accurate label weights. Only equipped with the BLA, we can achieve 41.7/46.6 AP under R50/R101-FCOS without any additional parameters. When combining the BLA with the FSR, our Balanced FCOS achieves SOTA results among dense detectors on the COCO test-dev set. Experiments conducted on other heads (T-Head, DyHead), detectors (DINO), and datasets (AI-TOD) further demonstrate the effectiveness of our method.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.