论标签质量对语义分割的重要性

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI:10.1109/CVPR.2018.00160

A. Zlateski, Ronnachai Jaroensri, Prafull Sharma, F. Durand

{"title":"论标签质量对语义分割的重要性","authors":"A. Zlateski, Ronnachai Jaroensri, Prafull Sharma, F. Durand","doi":"10.1109/CVPR.2018.00160","DOIUrl":null,"url":null,"abstract":"Convolutional networks (ConvNets) have become the dominant approach to semantic image segmentation. Producing accurate, pixel-level labels required for this task is a tedious and time consuming process; however, producing approximate, coarse labels could take only a fraction of the time and effort. We investigate the relationship between the quality of labels and the performance of ConvNets for semantic segmentation. We create a very large synthetic dataset with perfectly labeled street view scenes. From these perfect labels, we synthetically coarsen labels with different qualities and estimate human-hours required for producing them. We perform a series of experiments by training ConvNets with a varying number of training images and label quality. We found that the performance of ConvNets mostly depends on the time spent creating the training labels. That is, a larger coarsely-annotated dataset can yield the same performance as a smaller finely-annotated one. Furthermore, fine-tuning coarsely pre-trained ConvNets with few finely-annotated labels can yield comparable or superior performance to training it with a large amount of finely-annotated labels alone, at a fraction of the labeling cost. We demonstrate that our result is also valid for different network architectures, and various object classes in an urban scene.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"6 1","pages":"1479-1487"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":"{\"title\":\"On the Importance of Label Quality for Semantic Segmentation\",\"authors\":\"A. Zlateski, Ronnachai Jaroensri, Prafull Sharma, F. Durand\",\"doi\":\"10.1109/CVPR.2018.00160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional networks (ConvNets) have become the dominant approach to semantic image segmentation. Producing accurate, pixel-level labels required for this task is a tedious and time consuming process; however, producing approximate, coarse labels could take only a fraction of the time and effort. We investigate the relationship between the quality of labels and the performance of ConvNets for semantic segmentation. We create a very large synthetic dataset with perfectly labeled street view scenes. From these perfect labels, we synthetically coarsen labels with different qualities and estimate human-hours required for producing them. We perform a series of experiments by training ConvNets with a varying number of training images and label quality. We found that the performance of ConvNets mostly depends on the time spent creating the training labels. That is, a larger coarsely-annotated dataset can yield the same performance as a smaller finely-annotated one. Furthermore, fine-tuning coarsely pre-trained ConvNets with few finely-annotated labels can yield comparable or superior performance to training it with a large amount of finely-annotated labels alone, at a fraction of the labeling cost. We demonstrate that our result is also valid for different network architectures, and various object classes in an urban scene.\",\"PeriodicalId\":6564,\"journal\":{\"name\":\"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition\",\"volume\":\"6 1\",\"pages\":\"1479-1487\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"67\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.2018.00160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2018.00160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 67

摘要

卷积网络(ConvNets)已经成为语义图像分割的主流方法。生成准确的，像素级标签所需的这项任务是一个繁琐和耗时的过程;然而，产生近似的、粗糙的标签可能只需要一小部分时间和精力。我们研究了标签质量和卷积神经网络语义分割性能之间的关系。我们创建了一个非常大的合成数据集，上面有完美标记的街景场景。从这些完美的标签中，我们综合粗化了不同品质的标签，并估计了生产这些标签所需的工时。我们通过训练具有不同数量的训练图像和标签质量的卷积神经网络来执行一系列实验。我们发现卷积神经网络的性能主要取决于创建训练标签所花费的时间。也就是说，较大的粗标注数据集可以产生与较小的细标注数据集相同的性能。此外，微调带有少量精细标注标签的粗预训练卷积神经网络可以产生与仅使用大量精细标注标签训练的卷积神经网络相当或更好的性能，而标记成本只是前者的一小部分。我们证明了我们的结果也适用于城市场景中不同的网络架构和各种对象类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Importance of Label Quality for Semantic Segmentation

Convolutional networks (ConvNets) have become the dominant approach to semantic image segmentation. Producing accurate, pixel-level labels required for this task is a tedious and time consuming process; however, producing approximate, coarse labels could take only a fraction of the time and effort. We investigate the relationship between the quality of labels and the performance of ConvNets for semantic segmentation. We create a very large synthetic dataset with perfectly labeled street view scenes. From these perfect labels, we synthetically coarsen labels with different qualities and estimate human-hours required for producing them. We perform a series of experiments by training ConvNets with a varying number of training images and label quality. We found that the performance of ConvNets mostly depends on the time spent creating the training labels. That is, a larger coarsely-annotated dataset can yield the same performance as a smaller finely-annotated one. Furthermore, fine-tuning coarsely pre-trained ConvNets with few finely-annotated labels can yield comparable or superior performance to training it with a large amount of finely-annotated labels alone, at a fraction of the labeling cost. We demonstrate that our result is also valid for different network architectures, and various object classes in an urban scene.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量