Probability based dynamic soft label assignment for object detection

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yi Li , Sile Ma , Xiangyuan Jiang , Yizhong Luan , Zecui Jiang
{"title":"Probability based dynamic soft label assignment for object detection","authors":"Yi Li ,&nbsp;Sile Ma ,&nbsp;Xiangyuan Jiang ,&nbsp;Yizhong Luan ,&nbsp;Zecui Jiang","doi":"10.1016/j.imavis.2024.105240","DOIUrl":null,"url":null,"abstract":"<div><p>By defining effective supervision labels for network training, the performance of object detectors can be improved without incurring additional inference costs. Current label assignment strategies generally require two steps: first, constructing a positive sample candidate bag, and then designing labels for these samples. However, the construction of candidate bag of positive samples may result in some noisy samples being introduced into the label assignment process. We explore a single-step label assignment approach: directly generating a probability map as labels for all samples. We design the label assignment approach from the following perspectives: Firstly, it should be able to reduce the impact of noise samples. Secondly, each sample should be treated differently because each one matches the target to a different extent, which assists the network to learn more valuable information from high-quality samples. We propose a probability-based dynamic soft label assignment method. Instead of dividing the samples into positive and negative samples, a probability map, which is calculated based on prediction quality and prior knowledge, is used to supervise all anchor points of the classification branch. The weight of prior knowledge in the labels decreases as the network improves the quality of instance predictions, as a way to reduce noise samples introduced by prior knowledge. By using continuous probability values as labels to supervise the classification branch, the network is able to focus on high-quality samples. As demonstrated in the experiments on the MS COCO benchmark, our label assignment method achieves 40.9% AP in the ResNet-50 under 1x schedule, which improves FCOS performance by approximately 2.0% AP. The code has been available at <span><span><span>https://github.com/Liyi4578/PDSLA</span></span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105240"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003457","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

By defining effective supervision labels for network training, the performance of object detectors can be improved without incurring additional inference costs. Current label assignment strategies generally require two steps: first, constructing a positive sample candidate bag, and then designing labels for these samples. However, the construction of candidate bag of positive samples may result in some noisy samples being introduced into the label assignment process. We explore a single-step label assignment approach: directly generating a probability map as labels for all samples. We design the label assignment approach from the following perspectives: Firstly, it should be able to reduce the impact of noise samples. Secondly, each sample should be treated differently because each one matches the target to a different extent, which assists the network to learn more valuable information from high-quality samples. We propose a probability-based dynamic soft label assignment method. Instead of dividing the samples into positive and negative samples, a probability map, which is calculated based on prediction quality and prior knowledge, is used to supervise all anchor points of the classification branch. The weight of prior knowledge in the labels decreases as the network improves the quality of instance predictions, as a way to reduce noise samples introduced by prior knowledge. By using continuous probability values as labels to supervise the classification branch, the network is able to focus on high-quality samples. As demonstrated in the experiments on the MS COCO benchmark, our label assignment method achieves 40.9% AP in the ResNet-50 under 1x schedule, which improves FCOS performance by approximately 2.0% AP. The code has been available at https://github.com/Liyi4578/PDSLA.

基于概率的动态软标签分配用于物体检测
通过为网络训练定义有效的监督标签,可以在不增加推理成本的情况下提高物体检测器的性能。目前的标签分配策略一般需要两个步骤:首先,构建正样本候选包,然后为这些样本设计标签。然而,在构建阳性样本候选袋的过程中,可能会在标签分配过程中引入一些噪声样本。我们探索了一种单步标签分配方法:直接生成概率图作为所有样本的标签。我们从以下几个方面设计标签分配方法:首先,它应能减少噪声样本的影响。其次,应该区别对待每个样本,因为每个样本与目标的匹配程度不同,这有助于网络从高质量样本中学习到更多有价值的信息。我们提出了一种基于概率的动态软标签分配方法。我们不将样本分为正样本和负样本,而是使用基于预测质量和先验知识计算的概率图来监督分类分支的所有锚点。先验知识在标签中的权重会随着网络提高实例预测质量而降低,以此来减少先验知识带来的噪声样本。通过使用连续概率值作为标签来监督分类分支,网络能够将注意力集中在高质量样本上。正如在 MS COCO 基准实验中证明的那样,我们的标签分配方法在 1x 计划下的 ResNet-50 中实现了 40.9% 的 AP,将 FCOS 性能提高了约 2.0%。代码可在 https://github.com/Liyi4578/PDSLA 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信