Towards Human Performance on Sketch-Based Image Retrieval

Omar Seddati, S. Dupont, S. Mahmoudi, T. Dutoit
{"title":"Towards Human Performance on Sketch-Based Image Retrieval","authors":"Omar Seddati, S. Dupont, S. Mahmoudi, T. Dutoit","doi":"10.1145/3549555.3549582","DOIUrl":null,"url":null,"abstract":"Sketch-based image retrieval (SBIR) solutions are attracting increased interest in the field of computer vision. These solutions provide an intuitive and powerful tool to retrieve images in large-scale image databases. In this paper, we conduct a comprehensive study of classic triplet CNN training pipelines within the SBIR context. We study the impact of embeddings normalization, model sharing, margin selection, batch size, hard mining selection and the evolution of the number of hard triplets during training to propose several avenues for improvement. We also propose dropout column, an adaptation of dropout for triplet network and similar pipelines. In addition, we also introduce a novel approach to build state-of-the-art SBIR solutions that can be used with low power systems. The whole study is conducted using The Sketchy Database, a large-scale SBIR database. We carry out a series of experiments and show that adopting a few simple modifications enhances significantly existing SBIR pipelines (faster training & higher accuracy). Our study enables us to propose an enhanced pipeline that outperforms previous state-of-the-art on the Sketchy Database by a significant margin (a recall of 53.92% compared to 46.2% at k = 1) and reaches almost human performance (54.27%) on a large-scale benchmark.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3549555.3549582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Sketch-based image retrieval (SBIR) solutions are attracting increased interest in the field of computer vision. These solutions provide an intuitive and powerful tool to retrieve images in large-scale image databases. In this paper, we conduct a comprehensive study of classic triplet CNN training pipelines within the SBIR context. We study the impact of embeddings normalization, model sharing, margin selection, batch size, hard mining selection and the evolution of the number of hard triplets during training to propose several avenues for improvement. We also propose dropout column, an adaptation of dropout for triplet network and similar pipelines. In addition, we also introduce a novel approach to build state-of-the-art SBIR solutions that can be used with low power systems. The whole study is conducted using The Sketchy Database, a large-scale SBIR database. We carry out a series of experiments and show that adopting a few simple modifications enhances significantly existing SBIR pipelines (faster training & higher accuracy). Our study enables us to propose an enhanced pipeline that outperforms previous state-of-the-art on the Sketchy Database by a significant margin (a recall of 53.92% compared to 46.2% at k = 1) and reaches almost human performance (54.27%) on a large-scale benchmark.
基于草图的图像检索的人类性能研究
基于草图的图像检索(SBIR)解决方案在计算机视觉领域引起了越来越多的兴趣。这些解决方案为大规模图像数据库中的图像检索提供了一个直观而强大的工具。在本文中,我们在SBIR背景下对经典的三联体CNN训练管道进行了全面的研究。我们研究了嵌入归一化、模型共享、余量选择、批大小、硬挖掘选择和训练过程中硬三元组数量的演变的影响,提出了几个改进的途径。我们还提出了dropout列,这是对三重网络和类似管道的dropout的一种适应。此外,我们还介绍了一种新的方法来构建可用于低功耗系统的最先进的SBIR解决方案。整个研究使用的是一个大型的SBIR数据库——The sketch Database。我们进行了一系列的实验,结果表明,采用一些简单的修改可以显著增强现有的SBIR管道(更快的训练和更高的精度)。我们的研究使我们能够提出一个增强的管道,该管道在Sketchy数据库上的性能明显优于以前的最先进技术(召回率为53.92%,而k = 1时为46.2%),并且在大规模基准测试中几乎达到了人类的性能(54.27%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信