Data for Image Recognition Tasks: An Efficient Tool for Fine-Grained Annotations

International Conference on Pattern Recognition Applications and Methods Pub Date : 2019-02-19 DOI:10.5220/0007688709000907

Marco Filax, Tim Gonschorek, F. Ortmeier

{"title":"Data for Image Recognition Tasks: An Efficient Tool for Fine-Grained Annotations","authors":"Marco Filax, Tim Gonschorek, F. Ortmeier","doi":"10.5220/0007688709000907","DOIUrl":null,"url":null,"abstract":"Using large datasets is essential for machine learning. In practice, training a machine learning algorithm requires hundreds of samples. Multiple off-the-shelf datasets from the scientific domain exist to benchmark new approaches. However, when machine learning algorithms transit to industry, e.g., for a particular image classification problem, hundreds of specific purpose images are collected and annotated in laborious manual work. In this paper, we present a novel system to decrease the effort of annotating those large image sets. Therefore, we generate 2D bounding boxes from minimal 3D annotations using the known location and orientation of the camera. We annotate a particular object of interest in 3D once and project these annotations on to every frame of a video stream. The proposed approach is designed to work with off-the-shelf hardware. We demonstrate its applicability with an example from the real world. We generated a more extensive dataset than available in other works for a particular industrial use case: fine-grained recognition of items within grocery stores. Further, we make our dataset available to the interested vision community consisting of over 60,000 images. Some images were taken under ideal conditions for training while others were taken with the proposed approach in the wild.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Pattern Recognition Applications and Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0007688709000907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Using large datasets is essential for machine learning. In practice, training a machine learning algorithm requires hundreds of samples. Multiple off-the-shelf datasets from the scientific domain exist to benchmark new approaches. However, when machine learning algorithms transit to industry, e.g., for a particular image classification problem, hundreds of specific purpose images are collected and annotated in laborious manual work. In this paper, we present a novel system to decrease the effort of annotating those large image sets. Therefore, we generate 2D bounding boxes from minimal 3D annotations using the known location and orientation of the camera. We annotate a particular object of interest in 3D once and project these annotations on to every frame of a video stream. The proposed approach is designed to work with off-the-shelf hardware. We demonstrate its applicability with an example from the real world. We generated a more extensive dataset than available in other works for a particular industrial use case: fine-grained recognition of items within grocery stores. Further, we make our dataset available to the interested vision community consisting of over 60,000 images. Some images were taken under ideal conditions for training while others were taken with the proposed approach in the wild.

查看原文本刊更多论文

图像识别任务的数据:细粒度注释的有效工具

使用大型数据集对于机器学习至关重要。在实践中，训练机器学习算法需要数百个样本。科学领域存在多个现成的数据集，可以对新方法进行基准测试。然而，当机器学习算法转移到工业领域时，例如，对于特定的图像分类问题，需要收集数百张特定用途的图像，并在费力的手工工作中进行注释。在本文中，我们提出了一个新的系统，以减少标注这些大图像集的工作量。因此，我们使用已知的相机位置和方向，从最小的3D注释生成2D边界框。我们在3D中注释一个感兴趣的特定对象一次，并将这些注释投影到视频流的每一帧上。所提出的方法被设计用于使用现成的硬件。我们用一个来自现实世界的例子来证明它的适用性。我们为一个特定的工业用例生成了一个比其他作品更广泛的数据集:杂货店内物品的细粒度识别。此外，我们将我们的数据集提供给感兴趣的视觉社区，其中包含超过60,000张图像。一些图像是在理想的训练条件下拍摄的，而另一些则是在野外使用所提出的方法拍摄的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Pattern Recognition Applications and Methods

自引率

0.00%

发文量