AMECON: Abstract Meta-Concept Features for Text-Illustration

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI:10.1145/3078971.3078993

Ines Chami, Y. Tamaazousti, H. Borgne

{"title":"AMECON: Abstract Meta-Concept Features for Text-Illustration","authors":"Ines Chami, Y. Tamaazousti, H. Borgne","doi":"10.1145/3078971.3078993","DOIUrl":null,"url":null,"abstract":"Cross-media retrieval is a problem of high interest that is at the frontier between computer vision and natural language processing. The state-of-the-art in the domain consists of learning a common space with regard to some constraints of correlation or similarity from two textual and visual modalities that are processed in parallel and possibly jointly. This paper proposes a different approach that considers the cross-modal problem as a supervised mapping of visual modalities to textual ones. Each modality is thus seen as a particular projection of an abstract meta-concept, each of its dimension subsuming several semantic concepts (``meta'' aspect) but may not correspond to an actual one (``abstract'' aspect). In practice, the textual modality is used to generate a multi-label representation, further used to map the visual modality through a simple shallow neural network. While being quite easy to implement, the experiments show that our approach significantly outperforms the state-of-the-art on Flickr-8K and Flickr-30K datasets for the text-illustration task. The source code is available at http://perso.ecp.fr/~tamaazouy/.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3078971.3078993","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Cross-media retrieval is a problem of high interest that is at the frontier between computer vision and natural language processing. The state-of-the-art in the domain consists of learning a common space with regard to some constraints of correlation or similarity from two textual and visual modalities that are processed in parallel and possibly jointly. This paper proposes a different approach that considers the cross-modal problem as a supervised mapping of visual modalities to textual ones. Each modality is thus seen as a particular projection of an abstract meta-concept, each of its dimension subsuming several semantic concepts (``meta'' aspect) but may not correspond to an actual one (``abstract'' aspect). In practice, the textual modality is used to generate a multi-label representation, further used to map the visual modality through a simple shallow neural network. While being quite easy to implement, the experiments show that our approach significantly outperforms the state-of-the-art on Flickr-8K and Flickr-30K datasets for the text-illustration task. The source code is available at http://perso.ecp.fr/~tamaazouy/.

查看原文本刊更多论文

文本插图的抽象元概念特征

跨媒体检索是计算机视觉和自然语言处理之间的前沿问题。该领域的最新技术包括从两个并行或可能联合处理的文本和视觉模式中学习关于一些相关性或相似性约束的公共空间。本文提出了一种不同的方法，将跨模态问题视为视觉模态到文本模态的监督映射。因此，每个模态都被视为一个抽象元概念的特定投影，它的每个维度都包含几个语义概念(“元”方面)，但可能不对应于一个实际的概念(“抽象”方面)。在实践中，文本模态被用来生成多标签表示，进一步通过一个简单的浅神经网络来映射视觉模态。虽然很容易实现，但实验表明，我们的方法在文本插图任务的Flickr-8K和Flickr-30K数据集上的性能明显优于最先进的方法。源代码可从http://perso.ecp.fr/~tamaazouy/获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

自引率

0.00%

发文量