Pretraining instance segmentation models with bounding box annotations

Intelligent Systems with Applications Pub Date : 2024-10-28 DOI:10.1016/j.iswa.2024.200454

Cathaoir Agnew , Eoin M. Grua , Pepijn Van de Ven , Patrick Denny , Ciarán Eising , Anthony Scanlan

{"title":"Pretraining instance segmentation models with bounding box annotations","authors":"Cathaoir Agnew , Eoin M. Grua , Pepijn Van de Ven , Patrick Denny , Ciarán Eising , Anthony Scanlan","doi":"10.1016/j.iswa.2024.200454","DOIUrl":null,"url":null,"abstract":"<div><div>Annotating datasets for fully supervised instance segmentation tasks can be arduous and time-consuming, requiring a significant effort and cost investment. Producing bounding box annotations instead constitutes a significant reduction in this investment, but bounding box annotated data alone are not suitable for instance segmentation. This work utilizes ground truth bounding boxes to define coarsely annotated polygon masks, which we refer to as weak annotations, on which the models are pre-trained. We investigate the effect of pretraining on data with weak annotations and further fine-tuning on data with strong annotations, that is, finely annotated polygon masks for instance segmentation. The COCO 2017 detection dataset along with 3 model architectures, SOLOv2, Mask-RCNN, and Mask2former, were used to conduct experiments investigating the effect of pretraining on weak annotations. The Cityscapes and Pascal VOC 2012 datasets were used to validate this approach. The empirical results suggest two key outcomes from this investigation. Firstly, a sequential approach to annotating large-scale instance segmentation datasets would be beneficial, enabling higher-performance models in faster timeframes. This is accomplished by first labeling bounding boxes on your data followed by polygon masks. Secondly, it is possible to leverage object detection datasets for pretraining instance segmentation models while maintaining competitive results in the downstream task. This is reflected with 97.5%, 100.4% & 101.3% of the fully supervised performance being achieved with just 1%, 5% & 10% of the instance segmentation annotations of the COCO training dataset being utilized for the best performing model, Mask2former with a Swin-L backbone.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"24 ","pages":"Article 200454"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667305324001285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Annotating datasets for fully supervised instance segmentation tasks can be arduous and time-consuming, requiring a significant effort and cost investment. Producing bounding box annotations instead constitutes a significant reduction in this investment, but bounding box annotated data alone are not suitable for instance segmentation. This work utilizes ground truth bounding boxes to define coarsely annotated polygon masks, which we refer to as weak annotations, on which the models are pre-trained. We investigate the effect of pretraining on data with weak annotations and further fine-tuning on data with strong annotations, that is, finely annotated polygon masks for instance segmentation. The COCO 2017 detection dataset along with 3 model architectures, SOLOv2, Mask-RCNN, and Mask2former, were used to conduct experiments investigating the effect of pretraining on weak annotations. The Cityscapes and Pascal VOC 2012 datasets were used to validate this approach. The empirical results suggest two key outcomes from this investigation. Firstly, a sequential approach to annotating large-scale instance segmentation datasets would be beneficial, enabling higher-performance models in faster timeframes. This is accomplished by first labeling bounding boxes on your data followed by polygon masks. Secondly, it is possible to leverage object detection datasets for pretraining instance segmentation models while maintaining competitive results in the downstream task. This is reflected with 97.5%, 100.4% & 101.3% of the fully supervised performance being achieved with just 1%, 5% & 10% of the instance segmentation annotations of the COCO training dataset being utilized for the best performing model, Mask2former with a Swin-L backbone.

查看原文本刊更多论文

利用边界框注释预训练实例分割模型

为完全有监督的实例分割任务注释数据集可能既艰巨又耗时，需要投入大量精力和成本。用边界框注释可以大大减少这种投资，但仅靠边界框注释数据并不适合实例分割。这项工作利用地面真实边框来定义粗略注释的多边形掩码，我们称之为弱注释，并在此基础上对模型进行预训练。我们研究了在弱注释数据上进行预训练的效果，以及在强注释数据（即用于实例分割的精细注释多边形掩码）上进一步微调的效果。COCO 2017 检测数据集以及 SOLOv2、Mask-RCNN 和 Mask2former 三种模型架构被用来进行实验，研究预训练对弱注释的影响。Cityscapes 和 Pascal VOC 2012 数据集被用来验证这种方法。实证结果表明，这项研究取得了两项重要成果。首先，对大规模实例分割数据集进行注释的顺序方法是有益的，它能在更短的时间内建立更高性能的模型。要做到这一点，首先要在数据上标注边框，然后再标注多边形掩膜。其次，可以利用对象检测数据集对实例分割模型进行预训练，同时在下游任务中保持有竞争力的结果。具体表现为，在 COCO 训练数据集中，仅有 1%、5% 和 10%的实例分割注释被用于性能最佳的模型 Mask2former（以 Swin-L 为骨干），就能实现 97.5%、100.4% 和 101.3% 的完全监督性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Intelligent Systems with Applications

CiteScore

5.60

自引率

0.00%

发文量