Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval最新文献_第2页

Session details: Keynote 1 会议详情:主题演讲1

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3254612

Cees G. M. Snoek

引用次数: 0

TEX-Nets: Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition tex.net:用于纹理识别的二进制模式编码卷积神经网络

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079001

R. Anwer, F. Khan, Joost van de Weijer, Jorma T. Laaksonen

{"title":"TEX-Nets: Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition","authors":"R. Anwer, F. Khan, Joost van de Weijer, Jorma T. Laaksonen","doi":"10.1145/3078971.3079001","DOIUrl":"https://doi.org/10.1145/3078971.3079001","url":null,"abstract":"Recognizing materials and textures in realistic imaging conditions is a challenging computer vision problem. For many years, local features based orderless representations were a dominant approach for texture recognition. Recently deep local features, extracted from the intermediate layers of a Convolutional Neural Network (CNN), are used as filter banks. These dense local descriptors from a deep model, when encoded with Fisher Vectors, have shown to provide excellent results for texture recognition. The CNN models, employed in such approaches, take RGB patches as input and train on a large amount of labeled images. We show that CNN models, which we call TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard deep models trained on RGB patches. We further investigate two deep architectures, namely early and late fusion, to combine the texture and color information. Experiments on benchmark texture datasets clearly demonstrate that TEX-Nets provide complementary information to standard RGB deep network. Our approach provides a large gain of 4.8%, 3.5%, 2.6% and 4.1% respectively in accuracy on the DTD, KTH-TIPS-2a, KTH-TIPS-2b and Texture-10 datasets, compared to the standard RGB network of the same architecture. Further, our final combination leads to consistent improvements over the state-of-the-art on all four datasets.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126160310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

AMECON: Abstract Meta-Concept Features for Text-Illustration 文本插图的抽象元概念特征

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078993

Ines Chami, Y. Tamaazousti, H. Borgne

{"title":"AMECON: Abstract Meta-Concept Features for Text-Illustration","authors":"Ines Chami, Y. Tamaazousti, H. Borgne","doi":"10.1145/3078971.3078993","DOIUrl":"https://doi.org/10.1145/3078971.3078993","url":null,"abstract":"Cross-media retrieval is a problem of high interest that is at the frontier between computer vision and natural language processing. The state-of-the-art in the domain consists of learning a common space with regard to some constraints of correlation or similarity from two textual and visual modalities that are processed in parallel and possibly jointly. This paper proposes a different approach that considers the cross-modal problem as a supervised mapping of visual modalities to textual ones. Each modality is thus seen as a particular projection of an abstract meta-concept, each of its dimension subsuming several semantic concepts (``meta'' aspect) but may not correspond to an actual one (``abstract'' aspect). In practice, the textual modality is used to generate a multi-label representation, further used to map the visual modality through a simple shallow neural network. While being quite easy to implement, the experiments show that our approach significantly outperforms the state-of-the-art on Flickr-8K and Flickr-30K datasets for the text-illustration task. The source code is available at http://perso.ecp.fr/~tamaazouy/.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128491731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Session details: Oral Session 4: Cross-media Retrieval (Spotlight presentations) 会议详情:口头会议4:跨媒体检索(重点演讲)

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3254627

Giorgos Tolias

引用次数: 0

Multimodal Analysis of Image Search Intent: Intent Recognition in Image Search from User Behavior and Visual Content 图像搜索意图的多模态分析:基于用户行为和视觉内容的图像搜索意图识别

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078995

M. Soleymani, M. Riegler, P. Halvorsen

{"title":"Multimodal Analysis of Image Search Intent: Intent Recognition in Image Search from User Behavior and Visual Content","authors":"M. Soleymani, M. Riegler, P. Halvorsen","doi":"10.1145/3078971.3078995","DOIUrl":"https://doi.org/10.1145/3078971.3078995","url":null,"abstract":"Users search for multimedia content with different underlying motivations or intentions. Study of user search intentions is an emerging topic in information retrieval since understanding why a user is searching for a content is crucial for satisfying the user's need. In this paper, we aimed at automatically recognizing a user's intent for image search in the early stage of a search session. We designed seven different search scenarios under the intent conditions of finding items, re-finding items and entertainment. We collected facial expressions, physiological responses, eye gaze and implicit user interactions from 51 participants who performed seven different search tasks on a custom-built image retrieval platform. We analyzed the users' spontaneous and explicit reactions under different intent conditions. Finally, we trained machine learning models to predict users' search intentions from the visual content of the visited images, the user interactions and the spontaneous responses. After fusing the visual and user interaction features, our system achieved the F-1 score of 0.722 for classifying three classes in a user-independent cross-validation. We found that eye gaze and implicit user interactions, including mouse movements and keystrokes are the most informative features. Given that the most promising results are obtained by modalities that can be captured unobtrusively and online, the results demonstrate the feasibility of deploying such methods for improving multimedia retrieval platforms.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130809372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Multi-view Manifold Learning for Media Interestingness Prediction 媒体兴趣预测的多视点流形学习

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079021

Yang Liu, Zhonglei Gu, Yiu-ming Cheung, K. Hua

引用次数: 11

Query and Keyframe Representations for Ad-hoc Video Search 自组织视频搜索的查询和关键帧表示

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079041

Fotini Markatopoulou, Damianos Galanopoulos, V. Mezaris, I. Patras

引用次数: 48

VideoAnalysis4ALL: An On-line Tool for the Automatic Fragmentation and Concept-based Annotation, and the Interactive Exploration of Videos VideoAnalysis4ALL:视频自动碎片化、基于概念的注释和交互式探索的在线工具

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079015

Chrysa Collyda, Evlampios Apostolidis, Alexandros Pournaras, Fotini Markatopoulou, V. Mezaris, I. Patras

{"title":"VideoAnalysis4ALL: An On-line Tool for the Automatic Fragmentation and Concept-based Annotation, and the Interactive Exploration of Videos","authors":"Chrysa Collyda, Evlampios Apostolidis, Alexandros Pournaras, Fotini Markatopoulou, V. Mezaris, I. Patras","doi":"10.1145/3078971.3079015","DOIUrl":"https://doi.org/10.1145/3078971.3079015","url":null,"abstract":"This paper presents the VideoAnalysis4ALL tool that supports the automatic fragmentation and concept-based annotation of videos, and the exploration of the annotated video fragments through an interactive user interface. The developed web application decomposes the video into two different granularities, namely shots and scenes, and annotates each fragment by evaluating the existence of a number (several hundreds) of high-level visual concepts in the keyframes extracted from these fragments. Through the analysis the tool enables the identification and labeling of semantically coherent video fragments, while its user interfaces allow the discovery of these fragments with the help of human-interpretable concepts. The integrated state-of-the-art video analysis technologies perform very well and, by exploiting the processing capabilities of multi-thread / multi-core architectures, reduce the time required for analysis to approximately one third of the video's duration, thus making the analysis three times faster than real-time processing.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"348 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123326798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

ClusterTag: Interactive Visualization, Clustering and Tagging Tool for Big Image Collections 用于大图像集合的交互式可视化、聚类和标记工具

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079018

Konstantin Pogorelov, M. Riegler, P. Halvorsen, C. Griwodz

引用次数: 3

Session details: Oral Session 2: Multimedia Indexing (Oral presentations) 会议详情:口头会议2:多媒体索引(口头报告)

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3254620

A. Ulges

引用次数: 0