GLAMI-1M: A Multilingual Image-Text Fashion Dataset

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-11-17 DOI:10.48550/arXiv.2211.14451

Vaclav Kosar, A. Hoskovec, Milan Šulc, Radek Bartyzal

引用次数: 1

Abstract

We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text. The dataset, source code and model checkpoints are published at https://github.com/glami/glami-1m

查看原文本刊更多论文

GLAMI-1M:多语言图像-文本时尚数据集

我们介绍了GLAMI-1M:最大的多语言图像-文本分类数据集和基准。该数据集包含带有商品描述的时尚产品图像，每种图像用13种语言中的一种进行描述。将191个类分类具有高质量的注释:测试集中的所有100k图像和1M训练集中的75%都是人工标记的。本文提出了图像-文本分类的基线，表明数据集提出了一个具有挑战性的细粒度分类问题:使用视觉和文本特征的最佳评分的恩布拉enet模型达到了69.7%的准确率。使用改进的Imagen模型进行的实验表明，该数据集也适用于以文本为条件的图像生成。数据集、源代码和模型检查点在https://github.com/glami/glami-1m上发布

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference

自引率

0.00%

发文量