面向大规模多标签图像检索的多级语义相似度学习

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI:10.1145/3206025.3206027

Ge Song, Xiaoyang Tan

{"title":"面向大规模多标签图像检索的多级语义相似度学习","authors":"Ge Song, Xiaoyang Tan","doi":"10.1145/3206025.3206027","DOIUrl":null,"url":null,"abstract":"We present a novel Deep Supervised Hashing with code operation (DSOH) method for large-scale multi-label image retrieval. This approach is in contrast with existing methods in that we respect both the intention gap and the intrinsic multilevel similarity of multi-labels. Particularly, our method allows a user to simultaneously present multiple query images rather than a single one to better express her intention, and correspondingly a separate sub-network in our architecture is specifically designed to fuse the query intention represented by each single query. Furthermore, as in the training stage, each image is annotated with multiple labels to enrich its semantic representation, we propose a new margin-adaptive triplet loss to learn the fine-grained similarity structure of multi-labels, which is known to be hard to capture. The whole system is trained in an end-to-end manner, and our experimental results demonstrate that the proposed method is not only able to learn useful multilevel semantic similarity-preserving binary codes but also achieves state-of-the-art retrieval performance on three popular datasets.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Learning Multilevel Semantic Similarity for Large-Scale Multi-Label Image Retrieval\",\"authors\":\"Ge Song, Xiaoyang Tan\",\"doi\":\"10.1145/3206025.3206027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a novel Deep Supervised Hashing with code operation (DSOH) method for large-scale multi-label image retrieval. This approach is in contrast with existing methods in that we respect both the intention gap and the intrinsic multilevel similarity of multi-labels. Particularly, our method allows a user to simultaneously present multiple query images rather than a single one to better express her intention, and correspondingly a separate sub-network in our architecture is specifically designed to fuse the query intention represented by each single query. Furthermore, as in the training stage, each image is annotated with multiple labels to enrich its semantic representation, we propose a new margin-adaptive triplet loss to learn the fine-grained similarity structure of multi-labels, which is known to be hard to capture. The whole system is trained in an end-to-end manner, and our experimental results demonstrate that the proposed method is not only able to learn useful multilevel semantic similarity-preserving binary codes but also achieves state-of-the-art retrieval performance on three popular datasets.\",\"PeriodicalId\":224132,\"journal\":{\"name\":\"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3206025.3206027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3206025.3206027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

提出了一种新的基于代码操作的深度监督哈希(DSOH)方法，用于大规模多标签图像检索。该方法与现有方法的不同之处在于，我们既尊重多标签的意图差距，又尊重多标签的内在多层次相似性。特别是，我们的方法允许用户同时呈现多个查询图像，而不是单个图像，以更好地表达她的意图，相应地，我们的体系结构中专门设计了一个单独的子网络来融合每个单个查询所表示的查询意图。此外，由于在训练阶段，每张图像都被标注了多个标签以丰富其语义表示，我们提出了一种新的边缘自适应三重损失来学习多标签的细粒度相似结构，这是已知难以捕获的。整个系统以端到端方式进行训练，实验结果表明，该方法不仅能够学习有用的多级语义相似保持二进制码，而且在三个流行的数据集上取得了最先进的检索性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Multilevel Semantic Similarity for Large-Scale Multi-Label Image Retrieval

We present a novel Deep Supervised Hashing with code operation (DSOH) method for large-scale multi-label image retrieval. This approach is in contrast with existing methods in that we respect both the intention gap and the intrinsic multilevel similarity of multi-labels. Particularly, our method allows a user to simultaneously present multiple query images rather than a single one to better express her intention, and correspondingly a separate sub-network in our architecture is specifically designed to fuse the query intention represented by each single query. Furthermore, as in the training stage, each image is annotated with multiple labels to enrich its semantic representation, we propose a new margin-adaptive triplet loss to learn the fine-grained similarity structure of multi-labels, which is known to be hard to capture. The whole system is trained in an end-to-end manner, and our experimental results demonstrate that the proposed method is not only able to learn useful multilevel semantic similarity-preserving binary codes but also achieves state-of-the-art retrieval performance on three popular datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

自引率

0.00%

发文量