R. Neven, D. Neven, Bert De Brabandere, M. Proesmans, Toon Goedem'e
{"title":"Weakly-Supervised Semantic Segmentation by Learning Label Uncertainty","authors":"R. Neven, D. Neven, Bert De Brabandere, M. Proesmans, Toon Goedem'e","doi":"10.1109/ICCVW54120.2021.00193","DOIUrl":null,"url":null,"abstract":"Since the rise of deep learning, many computer vision tasks have seen significant advancements. However, the downside of deep learning is that it is very data-hungry. Especially for segmentation problems, training a deep neural net requires dense supervision in the form of pixel-perfect image labels, which are very costly. In this paper, we present a new loss function to train a segmentation network with only a small subset of pixel-perfect labels, but take the advantage of weakly-annotated training samples in the form of cheap bounding-box labels. Unlike recent works which make use of box-to-mask proposal generators, our loss trains the network to learn a label uncertainty within the bounding-box, which can be leveraged to perform online bootstrapping (i.e. transforming the boxes to segmentation masks), while training the network. We evaluated our method on binary segmentation tasks, as well as a multi-class segmentation task (CityScapes vehicles and persons). We trained each task on a dataset comprised of only 18% pixel-perfect and 82% bounding-box labels, and compared the results to a baseline model trained on a completely pixel-perfect dataset. For the binary segmentation tasks, our method achieves an IoU score which is 98.33% as good as our baseline model, while for the multi-class task, our method is 97.12% as good as our baseline model (77.5 vs. 79.8 mIoU).","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCVW54120.2021.00193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Since the rise of deep learning, many computer vision tasks have seen significant advancements. However, the downside of deep learning is that it is very data-hungry. Especially for segmentation problems, training a deep neural net requires dense supervision in the form of pixel-perfect image labels, which are very costly. In this paper, we present a new loss function to train a segmentation network with only a small subset of pixel-perfect labels, but take the advantage of weakly-annotated training samples in the form of cheap bounding-box labels. Unlike recent works which make use of box-to-mask proposal generators, our loss trains the network to learn a label uncertainty within the bounding-box, which can be leveraged to perform online bootstrapping (i.e. transforming the boxes to segmentation masks), while training the network. We evaluated our method on binary segmentation tasks, as well as a multi-class segmentation task (CityScapes vehicles and persons). We trained each task on a dataset comprised of only 18% pixel-perfect and 82% bounding-box labels, and compared the results to a baseline model trained on a completely pixel-perfect dataset. For the binary segmentation tasks, our method achieves an IoU score which is 98.33% as good as our baseline model, while for the multi-class task, our method is 97.12% as good as our baseline model (77.5 vs. 79.8 mIoU).
自从深度学习兴起以来,许多计算机视觉任务都取得了重大进展。然而,深度学习的缺点是它非常需要数据。特别是对于分割问题,训练深度神经网络需要以像素完美图像标签的形式进行密集监督,这是非常昂贵的。在本文中,我们提出了一种新的损失函数来训练一个只有一小部分像素完美标签的分割网络,但利用了以廉价边界盒标签形式存在的弱标注训练样本的优势。与最近使用盒到掩码建议生成器的工作不同,我们的损失训练网络在边界盒内学习标签不确定性,这可以用来在训练网络时执行在线引导(即将盒子转换为分割掩码)。我们在二元分割任务以及多类别分割任务(cityscape车辆和人员)上评估了我们的方法。我们在仅包含18%像素完美和82%边界框标签的数据集上训练每个任务,并将结果与在完全像素完美数据集上训练的基线模型进行比较。对于二元分割任务,我们的方法达到的IoU分数是基线模型的98.33%,而对于多类任务,我们的方法达到的IoU分数是基线模型的97.12% (77.5 vs. 79.8 mIoU)。