Probability based dynamic soft label assignment for object detection

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2024-08-31 DOI:10.1016/j.imavis.2024.105240

Yi Li , Sile Ma , Xiangyuan Jiang , Yizhong Luan , Zecui Jiang

{"title":"Probability based dynamic soft label assignment for object detection","authors":"Yi Li , Sile Ma , Xiangyuan Jiang , Yizhong Luan , Zecui Jiang","doi":"10.1016/j.imavis.2024.105240","DOIUrl":null,"url":null,"abstract":"<div><p>By defining effective supervision labels for network training, the performance of object detectors can be improved without incurring additional inference costs. Current label assignment strategies generally require two steps: first, constructing a positive sample candidate bag, and then designing labels for these samples. However, the construction of candidate bag of positive samples may result in some noisy samples being introduced into the label assignment process. We explore a single-step label assignment approach: directly generating a probability map as labels for all samples. We design the label assignment approach from the following perspectives: Firstly, it should be able to reduce the impact of noise samples. Secondly, each sample should be treated differently because each one matches the target to a different extent, which assists the network to learn more valuable information from high-quality samples. We propose a probability-based dynamic soft label assignment method. Instead of dividing the samples into positive and negative samples, a probability map, which is calculated based on prediction quality and prior knowledge, is used to supervise all anchor points of the classification branch. The weight of prior knowledge in the labels decreases as the network improves the quality of instance predictions, as a way to reduce noise samples introduced by prior knowledge. By using continuous probability values as labels to supervise the classification branch, the network is able to focus on high-quality samples. As demonstrated in the experiments on the MS COCO benchmark, our label assignment method achieves 40.9% AP in the ResNet-50 under 1x schedule, which improves FCOS performance by approximately 2.0% AP. The code has been available at <span><span><span>https://github.com/Liyi4578/PDSLA</span></span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105240"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003457","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

By defining effective supervision labels for network training, the performance of object detectors can be improved without incurring additional inference costs. Current label assignment strategies generally require two steps: first, constructing a positive sample candidate bag, and then designing labels for these samples. However, the construction of candidate bag of positive samples may result in some noisy samples being introduced into the label assignment process. We explore a single-step label assignment approach: directly generating a probability map as labels for all samples. We design the label assignment approach from the following perspectives: Firstly, it should be able to reduce the impact of noise samples. Secondly, each sample should be treated differently because each one matches the target to a different extent, which assists the network to learn more valuable information from high-quality samples. We propose a probability-based dynamic soft label assignment method. Instead of dividing the samples into positive and negative samples, a probability map, which is calculated based on prediction quality and prior knowledge, is used to supervise all anchor points of the classification branch. The weight of prior knowledge in the labels decreases as the network improves the quality of instance predictions, as a way to reduce noise samples introduced by prior knowledge. By using continuous probability values as labels to supervise the classification branch, the network is able to focus on high-quality samples. As demonstrated in the experiments on the MS COCO benchmark, our label assignment method achieves 40.9% AP in the ResNet-50 under 1x schedule, which improves FCOS performance by approximately 2.0% AP. The code has been available at https://github.com/Liyi4578/PDSLA.

查看原文本刊更多论文

基于概率的动态软标签分配用于物体检测

通过为网络训练定义有效的监督标签，可以在不增加推理成本的情况下提高物体检测器的性能。目前的标签分配策略一般需要两个步骤：首先，构建正样本候选包，然后为这些样本设计标签。然而，在构建阳性样本候选袋的过程中，可能会在标签分配过程中引入一些噪声样本。我们探索了一种单步标签分配方法：直接生成概率图作为所有样本的标签。我们从以下几个方面设计标签分配方法：首先，它应能减少噪声样本的影响。其次，应该区别对待每个样本，因为每个样本与目标的匹配程度不同，这有助于网络从高质量样本中学习到更多有价值的信息。我们提出了一种基于概率的动态软标签分配方法。我们不将样本分为正样本和负样本，而是使用基于预测质量和先验知识计算的概率图来监督分类分支的所有锚点。先验知识在标签中的权重会随着网络提高实例预测质量而降低，以此来减少先验知识带来的噪声样本。通过使用连续概率值作为标签来监督分类分支，网络能够将注意力集中在高质量样本上。正如在 MS COCO 基准实验中证明的那样，我们的标签分配方法在 1x 计划下的 ResNet-50 中实现了 40.9% 的 AP，将 FCOS 性能提高了约 2.0%。代码可在 https://github.com/Liyi4578/PDSLA 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.