Quan Tang , Fagui Liu , Dengke Zhang , Jun Jiang , Xuhao Tang , C.L. Philip Chen
{"title":"提高中等样本的语义图像分割灵敏度","authors":"Quan Tang , Fagui Liu , Dengke Zhang , Jun Jiang , Xuhao Tang , C.L. Philip Chen","doi":"10.1016/j.imavis.2024.105357","DOIUrl":null,"url":null,"abstract":"<div><div>Dominant paradigms in modern semantic segmentation resort to the scheme of pixel-wise classification and do supervised training with the standard cross-entropy loss (CE). Although CE is intuitively straightforward and suitable for this task, it only cares about the predicted score of the target category and ignores the probability distribution information. We further notice that fitting hard examples, even if their number is small, results in model over-fitting in the test stage, as accumulated CE losses overwhelm the model during training. Besides, a large number of easy examples may also dazzle the model training. Based on this observation, this work presents a novel loss function we call Sensitive Loss (SL), which utilizes the overall predicted probability distribution information to down-weight the contribution of extremely hard examples (outliers) and easy examples (inliers) during training and rapidly focuses model learning on moderate examples. In this manner, SL encourages the model to learn potential feature generalities rather than diving into the details and noise implied by outliers to the extent. Thus, it is capable of alleviating over-fitting and improving generalization capacity. We also propose a dynamic Learning Rate Scaling (LRS) strategy to alleviate the decreasing gradient and improve the performance of SL. Extensive experiments evidence that our Sensitive Loss is superior to existing handcrafted loss functions and on par with searched losses, which generalize well to a wide range of datasets and algorithms. Specifically, training with the proposed SL brings a notable 1.7% mIoU improvement for the Mask2Former framework on Cityscapes dataset off the shelf.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105357"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Increase the sensitivity of moderate examples for semantic image segmentation\",\"authors\":\"Quan Tang , Fagui Liu , Dengke Zhang , Jun Jiang , Xuhao Tang , C.L. Philip Chen\",\"doi\":\"10.1016/j.imavis.2024.105357\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Dominant paradigms in modern semantic segmentation resort to the scheme of pixel-wise classification and do supervised training with the standard cross-entropy loss (CE). Although CE is intuitively straightforward and suitable for this task, it only cares about the predicted score of the target category and ignores the probability distribution information. We further notice that fitting hard examples, even if their number is small, results in model over-fitting in the test stage, as accumulated CE losses overwhelm the model during training. Besides, a large number of easy examples may also dazzle the model training. Based on this observation, this work presents a novel loss function we call Sensitive Loss (SL), which utilizes the overall predicted probability distribution information to down-weight the contribution of extremely hard examples (outliers) and easy examples (inliers) during training and rapidly focuses model learning on moderate examples. In this manner, SL encourages the model to learn potential feature generalities rather than diving into the details and noise implied by outliers to the extent. Thus, it is capable of alleviating over-fitting and improving generalization capacity. We also propose a dynamic Learning Rate Scaling (LRS) strategy to alleviate the decreasing gradient and improve the performance of SL. Extensive experiments evidence that our Sensitive Loss is superior to existing handcrafted loss functions and on par with searched losses, which generalize well to a wide range of datasets and algorithms. Specifically, training with the proposed SL brings a notable 1.7% mIoU improvement for the Mask2Former framework on Cityscapes dataset off the shelf.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"154 \",\"pages\":\"Article 105357\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624004621\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004621","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Increase the sensitivity of moderate examples for semantic image segmentation
Dominant paradigms in modern semantic segmentation resort to the scheme of pixel-wise classification and do supervised training with the standard cross-entropy loss (CE). Although CE is intuitively straightforward and suitable for this task, it only cares about the predicted score of the target category and ignores the probability distribution information. We further notice that fitting hard examples, even if their number is small, results in model over-fitting in the test stage, as accumulated CE losses overwhelm the model during training. Besides, a large number of easy examples may also dazzle the model training. Based on this observation, this work presents a novel loss function we call Sensitive Loss (SL), which utilizes the overall predicted probability distribution information to down-weight the contribution of extremely hard examples (outliers) and easy examples (inliers) during training and rapidly focuses model learning on moderate examples. In this manner, SL encourages the model to learn potential feature generalities rather than diving into the details and noise implied by outliers to the extent. Thus, it is capable of alleviating over-fitting and improving generalization capacity. We also propose a dynamic Learning Rate Scaling (LRS) strategy to alleviate the decreasing gradient and improve the performance of SL. Extensive experiments evidence that our Sensitive Loss is superior to existing handcrafted loss functions and on par with searched losses, which generalize well to a wide range of datasets and algorithms. Specifically, training with the proposed SL brings a notable 1.7% mIoU improvement for the Mask2Former framework on Cityscapes dataset off the shelf.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.