Proceedings of the 2020 International Conference on Multimedia Retrieval最新文献_第8页

Learning to Select Elements for Graphic Design 学习选择平面设计元素

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-01 DOI: 10.1145/3372278.3390678

Guolong Wang, Zheng Qin, Junchi Yan, Liu Jiang

{"title":"Learning to Select Elements for Graphic Design","authors":"Guolong Wang, Zheng Qin, Junchi Yan, Liu Jiang","doi":"10.1145/3372278.3390678","DOIUrl":"https://doi.org/10.1145/3372278.3390678","url":null,"abstract":"Selecting elements for graphic design is essential for ensuring a correct understanding of clients' requirements as well as improving the efficiency of designers before a fine-designed process. Some semi-automatic design tools proposed layout templates where designers always select elements according to the rectangular boxes that specify how elements are placed. In practice, layout and element selection are complementary. Compared to the layout which can be readily obtained from pre-designed templates, it is generally time-consuming to mindfully pick out suitable elements, which calls for an automation of elements selection. To address this, we formulate element selection as a sequential decision-making process and develop a deep element selection network (DESN). Given a layout file with annotated elements, new graphical elements are selected to form graphic designs based on aesthetics and consistency criteria. To train our DESN, we propose an end-to-end, reinforcement learning based framework, where we design a novel reward function that jointly accounts for visual aesthetics and consistency. Based on this, visually readable and aesthetic drafts can be efficiently generated. We further contribute a layout-poster dataset with exhaustively labeled attributes of poster key elements. Qualitative and quantitative results indicate the efficacy of our approach.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115084197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Efficient Base Class Selection Algorithms for Few-Shot Classification 基于多样本分类的高效基类选择算法

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-01 DOI: 10.1145/3372278.3390724

Takumi Ohkuma, Hideki Nakayama

引用次数: 0

Actor-Critic Sequence Generation for Relative Difference Captioning 相对差异字幕的演员评论序列生成

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-01 DOI: 10.1145/3372278.3390679

Z. Fei

引用次数: 7

Detecting, Classifying, and Mapping Retail Storefronts Using Street-level Imagery 使用街道级图像检测、分类和绘制零售店面

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-01 DOI: 10.1145/3372278.3390706

Shahin Sharifi Noorian, S. Qiu, A. Psyllidis, A. Bozzon, G. Houben

引用次数: 11

Multi-Attention Multimodal Sentiment Analysis 多注意多模态情感分析

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-01 DOI: 10.1145/3372278.3390698

Taeyong Kim, Bowon Lee

{"title":"Multi-Attention Multimodal Sentiment Analysis","authors":"Taeyong Kim, Bowon Lee","doi":"10.1145/3372278.3390698","DOIUrl":"https://doi.org/10.1145/3372278.3390698","url":null,"abstract":"Sentiment analysis plays an important role in natural-language processing. It has been performed on multimodal data including text, audio, and video. Previously conducted research does not make full utilization of such heterogeneous data. In this study, we propose a model of Multi-Attention Recurrent Neural Network (MA-RNN) for performing sentiment analysis on multimodal data. The proposed network consists of two attention layers and a Bidirectional Gated Recurrent Neural Network (BiGRU). The first attention layer is used for data fusion and dimensionality reduction, and the second attention layer is used for the augmentation of BiGRU to capture key parts of the contextual information among utterances. Experiments on multimodal sentiment analysis indicate that our proposed model achieves the state-of-the-art performance of 84.31% accuracy on the Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis (CMU-MOSI) dataset. Furthermore, an ablation study is conducted to evaluate the contributions of different components of the network. We believe that our findings of this study may also offer helpful insights into the design of models using multimodal data.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132514895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Deep Discrete Attention Guided Hashing for Face Image Retrieval 深度离散注意引导哈希人脸图像检索

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-01 DOI: 10.1145/3372278.3390683

Zhi Xiong, Dayan Wu, Wen Gu, Haisu Zhang, Bo Li, Weiping Wang

{"title":"Deep Discrete Attention Guided Hashing for Face Image Retrieval","authors":"Zhi Xiong, Dayan Wu, Wen Gu, Haisu Zhang, Bo Li, Weiping Wang","doi":"10.1145/3372278.3390683","DOIUrl":"https://doi.org/10.1145/3372278.3390683","url":null,"abstract":"Recently, face image hashing has been proposed in large-scale face image retrieval due to its storage and computational efficiency. However, owing to the large intra-identity variation (same identity with different poses, illuminations, and facial expressions) and the small inter-identity separability (different identities look similar) of face images, existing face image hashing methods have limited power to generate discriminative hash codes. In this work, we propose a deep hashing method specially designed for face image retrieval named deep Discrete Attention Guided Hashing (DAGH). In DAGH, the discriminative power of hash codes is enhanced by a well-designed discrete identity loss, where not only the separability of the learned hash codes for different identities is encouraged, but also the intra-identity variation of the hash codes for the same identities is compacted. Besides, to obtain the fine-grained face features, DAGH employs a multi-attention cascade network structure to highlight discriminative face features. Moreover, we introduce a discrete hash layer into the network, along with the proposed modified backpropagation algorithm, our model can be optimized under discrete constraint. Experiments on two widely used face image retrieval datasets demonstrate the inspiring performance of DAGH over the state-of-the-art face image hashing methods.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127227622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Rank-embedded Hashing for Large-scale Image Retrieval 大规模图像检索的嵌入秩哈希

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-01 DOI: 10.1145/3372278.3390716

Haiyan Fu, Ying Li, Hengheng Zhang, Jinfeng Liu, Tao Yao

{"title":"Rank-embedded Hashing for Large-scale Image Retrieval","authors":"Haiyan Fu, Ying Li, Hengheng Zhang, Jinfeng Liu, Tao Yao","doi":"10.1145/3372278.3390716","DOIUrl":"https://doi.org/10.1145/3372278.3390716","url":null,"abstract":"With the growth of images on the Internet, plenty of hashing methods are developed to handle the large-scale image retrieval task. Hashing methods map data from high dimension to compact codes, so that they can effectively cope with complicated image features. However, the quantization process of hashing results in unescapable information loss. As a consequence, it is a challenge to measure the similarity between images with generated binary codes. The latest works usually focus on learning deep features and hashing functions simultaneously to preserve the similarity between images, while the similarity metric is fixed. In this paper, we propose a Rank-embedded Hashing (ReHash) algorithm where the ranking list is automatically optimized together with the feedback of the supervised hashing. Specifically, ReHash jointly trains the metric learning and the hashing codes in an end-to-end model. In this way, the similarity between images are enhanced by the ranking process. Meanwhile, the ranking results are an additional supervision for the hashing function learning as well. Extensive experiments show that our ReHash outperforms the state-of-the-art hashing methods for large-scale image retrieval.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126492928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Knowledge Enhanced Neural Fashion Trend Forecasting 知识增强神经时尚趋势预测

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-05-07 DOI: 10.1145/3372278.3390677

Yunshan Ma, Yujuan Ding, Xun Yang, Lizi Liao, Wai Keung Wong, Tat-Seng Chua

{"title":"Knowledge Enhanced Neural Fashion Trend Forecasting","authors":"Yunshan Ma, Yujuan Ding, Xun Yang, Lizi Liao, Wai Keung Wong, Tat-Seng Chua","doi":"10.1145/3372278.3390677","DOIUrl":"https://doi.org/10.1145/3372278.3390677","url":null,"abstract":"Fashion trend forecasting is a crucial task for both academia andindustry. Although some efforts have been devoted to tackling this challenging task, they only studied limited fashion elements with highly seasonal or simple patterns, which could hardly reveal thereal fashion trends. Towards insightful fashion trend forecasting,this work focuses on investigating fine-grained fashion element trends for specific user groups. We first contribute a large-scale fashion trend dataset (FIT) collected from Instagram with extracted time series fashion element records and user information. Furthermore, to effectively model the time series data of fashion elements with rather complex patterns, we propose a Knowledge Enhanced Recurrent Network model (KERN) which takes advantage of the capability of deep recurrent neural networks in modeling time series data. Moreover, it leverages internal and external knowledgein fashion domain that affects the time-series patterns of fashion element trends. Such incorporation of domain knowledge further enhances the deep learning model in capturing the patterns of specific fashion elements and predicting the future trends. Extensive experiments demonstrate that the proposed KERN model can effectively capture the complicated patterns of objective fashion elements, therefore making preferable fashion trend forecast.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129859867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do HLVU:以人类的方式测试对电影的深刻理解的新挑战

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-05-01 DOI: 10.1145/3372278.3390742

Keith Curtis, G. Awad, Shahzad Rajput, I. Soboroff

{"title":"HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do","authors":"Keith Curtis, G. Awad, Shahzad Rajput, I. Soboroff","doi":"10.1145/3372278.3390742","DOIUrl":"https://doi.org/10.1145/3372278.3390742","url":null,"abstract":"In this paper we propose a new evaluation challenge and direction in the area of High-level Video Understanding. The challenge we are proposing is designed to test automatic video analysis and understanding, and how accurately systems can comprehend a movie in terms of actors, entities, events and their relationship to each other. A pilot High-Level Video Understanding (HLVU) dataset of open source movies were collected for human assessors to build a knowledge graph representing each of them. A set of queries will be derived from the knowledge graph to test systems on retrieving relationships among actors, as well as reasoning and retrieving non-visual concepts. The objective is to benchmark if a computer system can \"understand\" non-explicit but obvious relationships the same way humans do when they watch the same movies. This is long-standing problem that is being addressed in the text domain and this project moves similar research to the video domain. Work of this nature is foundational to future video analytics and video understanding technologies. This work can be of interest to streaming services and broadcasters hoping to provide more intuitive ways for their customers to interact with and consume video content.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"2021 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132154827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Image Retrieval using Multi-scale CNN Features Pooling 基于多尺度CNN特征池的图像检索

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-04-21 DOI: 10.1145/3372278.3390732

Federico Vaccaro, M. Bertini, Tiberio Uricchio, A. Bimbo

引用次数: 18