GLAMI-1M: A Multilingual Image-Text Fashion Dataset

Vaclav Kosar, A. Hoskovec, Milan Šulc, Radek Bartyzal
{"title":"GLAMI-1M: A Multilingual Image-Text Fashion Dataset","authors":"Vaclav Kosar, A. Hoskovec, Milan Šulc, Radek Bartyzal","doi":"10.48550/arXiv.2211.14451","DOIUrl":null,"url":null,"abstract":"We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text. The dataset, source code and model checkpoints are published at https://github.com/glami/glami-1m","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"202 1","pages":"607"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.14451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text. The dataset, source code and model checkpoints are published at https://github.com/glami/glami-1m
GLAMI-1M:多语言图像-文本时尚数据集
我们介绍了GLAMI-1M:最大的多语言图像-文本分类数据集和基准。该数据集包含带有商品描述的时尚产品图像,每种图像用13种语言中的一种进行描述。将191个类分类具有高质量的注释:测试集中的所有100k图像和1M训练集中的75%都是人工标记的。本文提出了图像-文本分类的基线,表明数据集提出了一个具有挑战性的细粒度分类问题:使用视觉和文本特征的最佳评分的恩布拉enet模型达到了69.7%的准确率。使用改进的Imagen模型进行的实验表明,该数据集也适用于以文本为条件的图像生成。数据集、源代码和模型检查点在https://github.com/glami/glami-1m上发布
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信