Towards Identification of Packaged Products via Computer Vision: Convolutional Neural Networks for Object Detection and Image Classification in Retail Environments

Proceedings of the 9th International Conference on the Internet of Things Pub Date : 2019-10-22 DOI:10.1145/3365871.3365899

K. Fuchs, T. Grundmann, E. Fleisch

{"title":"Towards Identification of Packaged Products via Computer Vision: Convolutional Neural Networks for Object Detection and Image Classification in Retail Environments","authors":"K. Fuchs, T. Grundmann, E. Fleisch","doi":"10.1145/3365871.3365899","DOIUrl":null,"url":null,"abstract":"Identification of packaged products in retail environments still relies on barcodes, requiring active user input and limited to one product at a time. Computer vision (CV) has already enabled many applications, but has so far been under-discussed in the retail domain, albeit allowing for faster, hands-free, more natural human-object interaction (e.g. via mixed reality headsets). To assess the potential of current convolutional neural network (CNN) architectures to reliably identify packaged products within a retail environment, we created and open-source a dataset of 300 images of vending machines with 15k labeled instances of 90 products. We assessed observed accuracies from transfer learning for image-based product classification (IC) and multi-product object detection (OD) on multiple CNN architectures, and the number of images instances required per product to achieve meaningful predictions. Results show that as little as six images are enough for 90% IC accuracy, but around 30 images are needed for 95% IC accuracy. For simultaneous OD, 42 instances per product are necessary and far more than 100 instances to produce robust results. Thus, this study demonstrates that even in realistic, fast-paced retail environments, image-based product identification provides an alternative to barcodes, especially for use-cases that do not require perfect 100% accuracy.","PeriodicalId":350460,"journal":{"name":"Proceedings of the 9th International Conference on the Internet of Things","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Conference on the Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3365871.3365899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

Identification of packaged products in retail environments still relies on barcodes, requiring active user input and limited to one product at a time. Computer vision (CV) has already enabled many applications, but has so far been under-discussed in the retail domain, albeit allowing for faster, hands-free, more natural human-object interaction (e.g. via mixed reality headsets). To assess the potential of current convolutional neural network (CNN) architectures to reliably identify packaged products within a retail environment, we created and open-source a dataset of 300 images of vending machines with 15k labeled instances of 90 products. We assessed observed accuracies from transfer learning for image-based product classification (IC) and multi-product object detection (OD) on multiple CNN architectures, and the number of images instances required per product to achieve meaningful predictions. Results show that as little as six images are enough for 90% IC accuracy, but around 30 images are needed for 95% IC accuracy. For simultaneous OD, 42 instances per product are necessary and far more than 100 instances to produce robust results. Thus, this study demonstrates that even in realistic, fast-paced retail environments, image-based product identification provides an alternative to barcodes, especially for use-cases that do not require perfect 100% accuracy.

查看原文本刊更多论文

面向包装产品的计算机视觉识别:零售环境中用于目标检测和图像分类的卷积神经网络

在零售环境中，包装产品的识别仍然依赖于条形码，需要主动的用户输入，并且一次仅限于一种产品。计算机视觉(CV)已经实现了许多应用，但到目前为止，在零售领域还没有得到充分的讨论，尽管它允许更快、免提、更自然的人机交互(例如通过混合现实耳机)。为了评估当前卷积神经网络(CNN)架构在零售环境中可靠识别包装产品的潜力，我们创建并开源了一个包含300张自动售货机图像的数据集，其中包含90种产品的15k个标记实例。我们评估了基于图像的产品分类(IC)和多产品目标检测(OD)在多个CNN架构上的迁移学习的观察准确性，以及每个产品实现有意义预测所需的图像实例数量。结果表明，只需6张图像就足以达到90%的IC精度，而要达到95%的IC精度则需要大约30张图像。对于同时OD，每个产品需要42个实例，要产生可靠的结果需要超过100个实例。因此，这项研究表明，即使在现实的、快节奏的零售环境中，基于图像的产品识别也提供了条形码的替代方案，特别是对于不需要100%精确的用例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 9th International Conference on the Internet of Things

自引率

0.00%

发文量