Increase the sensitivity of moderate examples for semantic image segmentation

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Quan Tang , Fagui Liu , Dengke Zhang , Jun Jiang , Xuhao Tang , C.L. Philip Chen
{"title":"Increase the sensitivity of moderate examples for semantic image segmentation","authors":"Quan Tang ,&nbsp;Fagui Liu ,&nbsp;Dengke Zhang ,&nbsp;Jun Jiang ,&nbsp;Xuhao Tang ,&nbsp;C.L. Philip Chen","doi":"10.1016/j.imavis.2024.105357","DOIUrl":null,"url":null,"abstract":"<div><div>Dominant paradigms in modern semantic segmentation resort to the scheme of pixel-wise classification and do supervised training with the standard cross-entropy loss (CE). Although CE is intuitively straightforward and suitable for this task, it only cares about the predicted score of the target category and ignores the probability distribution information. We further notice that fitting hard examples, even if their number is small, results in model over-fitting in the test stage, as accumulated CE losses overwhelm the model during training. Besides, a large number of easy examples may also dazzle the model training. Based on this observation, this work presents a novel loss function we call Sensitive Loss (SL), which utilizes the overall predicted probability distribution information to down-weight the contribution of extremely hard examples (outliers) and easy examples (inliers) during training and rapidly focuses model learning on moderate examples. In this manner, SL encourages the model to learn potential feature generalities rather than diving into the details and noise implied by outliers to the extent. Thus, it is capable of alleviating over-fitting and improving generalization capacity. We also propose a dynamic Learning Rate Scaling (LRS) strategy to alleviate the decreasing gradient and improve the performance of SL. Extensive experiments evidence that our Sensitive Loss is superior to existing handcrafted loss functions and on par with searched losses, which generalize well to a wide range of datasets and algorithms. Specifically, training with the proposed SL brings a notable 1.7% mIoU improvement for the Mask2Former framework on Cityscapes dataset off the shelf.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105357"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004621","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Dominant paradigms in modern semantic segmentation resort to the scheme of pixel-wise classification and do supervised training with the standard cross-entropy loss (CE). Although CE is intuitively straightforward and suitable for this task, it only cares about the predicted score of the target category and ignores the probability distribution information. We further notice that fitting hard examples, even if their number is small, results in model over-fitting in the test stage, as accumulated CE losses overwhelm the model during training. Besides, a large number of easy examples may also dazzle the model training. Based on this observation, this work presents a novel loss function we call Sensitive Loss (SL), which utilizes the overall predicted probability distribution information to down-weight the contribution of extremely hard examples (outliers) and easy examples (inliers) during training and rapidly focuses model learning on moderate examples. In this manner, SL encourages the model to learn potential feature generalities rather than diving into the details and noise implied by outliers to the extent. Thus, it is capable of alleviating over-fitting and improving generalization capacity. We also propose a dynamic Learning Rate Scaling (LRS) strategy to alleviate the decreasing gradient and improve the performance of SL. Extensive experiments evidence that our Sensitive Loss is superior to existing handcrafted loss functions and on par with searched losses, which generalize well to a wide range of datasets and algorithms. Specifically, training with the proposed SL brings a notable 1.7% mIoU improvement for the Mask2Former framework on Cityscapes dataset off the shelf.

Abstract Image

提高中等样本的语义图像分割灵敏度
现代语义分割的主流范式采用逐像素分类方案,并使用标准交叉熵损失(CE)进行监督训练。虽然CE直观直观,适合这个任务,但它只关心目标类别的预测分数,而忽略了概率分布信息。我们进一步注意到,拟合硬样本,即使它们的数量很小,也会导致模型在测试阶段过度拟合,因为累积的CE损失会在训练期间淹没模型。此外,大量简单的例子也可能使模型训练眼花缭乱。基于这一观察,本工作提出了一种新的损失函数,我们称之为敏感损失(SL),它利用总体预测概率分布信息在训练过程中降低极难样本(离群值)和简单样本(内线)的权重,并迅速将模型学习集中在中等样本上。通过这种方式,SL鼓励模型学习潜在的特征通用性,而不是深入到异常值所隐含的细节和噪声中。因此,它能够减轻过拟合,提高泛化能力。我们还提出了一种动态学习率缩放(LRS)策略来缓解梯度下降并提高SL的性能。大量的实验表明,我们的敏感损失优于现有的手工制作的损失函数,与搜索损失相当,可以很好地推广到广泛的数据集和算法。具体来说,使用拟议的SL进行训练,可以为cityscape数据集上的Mask2Former框架带来1.7%的mIoU改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信