{"title":"Session details: Keynote 1","authors":"Cees G. M. Snoek","doi":"10.1145/3254612","DOIUrl":"https://doi.org/10.1145/3254612","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126133194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Anwer, F. Khan, Joost van de Weijer, Jorma T. Laaksonen
{"title":"TEX-Nets: Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition","authors":"R. Anwer, F. Khan, Joost van de Weijer, Jorma T. Laaksonen","doi":"10.1145/3078971.3079001","DOIUrl":"https://doi.org/10.1145/3078971.3079001","url":null,"abstract":"Recognizing materials and textures in realistic imaging conditions is a challenging computer vision problem. For many years, local features based orderless representations were a dominant approach for texture recognition. Recently deep local features, extracted from the intermediate layers of a Convolutional Neural Network (CNN), are used as filter banks. These dense local descriptors from a deep model, when encoded with Fisher Vectors, have shown to provide excellent results for texture recognition. The CNN models, employed in such approaches, take RGB patches as input and train on a large amount of labeled images. We show that CNN models, which we call TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard deep models trained on RGB patches. We further investigate two deep architectures, namely early and late fusion, to combine the texture and color information. Experiments on benchmark texture datasets clearly demonstrate that TEX-Nets provide complementary information to standard RGB deep network. Our approach provides a large gain of 4.8%, 3.5%, 2.6% and 4.1% respectively in accuracy on the DTD, KTH-TIPS-2a, KTH-TIPS-2b and Texture-10 datasets, compared to the standard RGB network of the same architecture. Further, our final combination leads to consistent improvements over the state-of-the-art on all four datasets.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126160310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AMECON: Abstract Meta-Concept Features for Text-Illustration","authors":"Ines Chami, Y. Tamaazousti, H. Borgne","doi":"10.1145/3078971.3078993","DOIUrl":"https://doi.org/10.1145/3078971.3078993","url":null,"abstract":"Cross-media retrieval is a problem of high interest that is at the frontier between computer vision and natural language processing. The state-of-the-art in the domain consists of learning a common space with regard to some constraints of correlation or similarity from two textual and visual modalities that are processed in parallel and possibly jointly. This paper proposes a different approach that considers the cross-modal problem as a supervised mapping of visual modalities to textual ones. Each modality is thus seen as a particular projection of an abstract meta-concept, each of its dimension subsuming several semantic concepts (``meta'' aspect) but may not correspond to an actual one (``abstract'' aspect). In practice, the textual modality is used to generate a multi-label representation, further used to map the visual modality through a simple shallow neural network. While being quite easy to implement, the experiments show that our approach significantly outperforms the state-of-the-art on Flickr-8K and Flickr-30K datasets for the text-illustration task. The source code is available at http://perso.ecp.fr/~tamaazouy/.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128491731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Oral Session 4: Cross-media Retrieval (Spotlight presentations)","authors":"Giorgos Tolias","doi":"10.1145/3254627","DOIUrl":"https://doi.org/10.1145/3254627","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129092353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Analysis of Image Search Intent: Intent Recognition in Image Search from User Behavior and Visual Content","authors":"M. Soleymani, M. Riegler, P. Halvorsen","doi":"10.1145/3078971.3078995","DOIUrl":"https://doi.org/10.1145/3078971.3078995","url":null,"abstract":"Users search for multimedia content with different underlying motivations or intentions. Study of user search intentions is an emerging topic in information retrieval since understanding why a user is searching for a content is crucial for satisfying the user's need. In this paper, we aimed at automatically recognizing a user's intent for image search in the early stage of a search session. We designed seven different search scenarios under the intent conditions of finding items, re-finding items and entertainment. We collected facial expressions, physiological responses, eye gaze and implicit user interactions from 51 participants who performed seven different search tasks on a custom-built image retrieval platform. We analyzed the users' spontaneous and explicit reactions under different intent conditions. Finally, we trained machine learning models to predict users' search intentions from the visual content of the visited images, the user interactions and the spontaneous responses. After fusing the visual and user interaction features, our system achieved the F-1 score of 0.722 for classifying three classes in a user-independent cross-validation. We found that eye gaze and implicit user interactions, including mouse movements and keystrokes are the most informative features. Given that the most promising results are obtained by modalities that can be captured unobtrusively and online, the results demonstrate the feasibility of deploying such methods for improving multimedia retrieval platforms.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130809372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-view Manifold Learning for Media Interestingness Prediction","authors":"Yang Liu, Zhonglei Gu, Yiu-ming Cheung, K. Hua","doi":"10.1145/3078971.3079021","DOIUrl":"https://doi.org/10.1145/3078971.3079021","url":null,"abstract":"Media interestingness prediction plays an important role in many real-world applications and attracts much research attention recently. In this paper, we aim to investigate this problem from the perspective of supervised feature extraction. Specifically, we design a novel algorithm dubbed Multi-view Manifold Learning (M) to uncover the latent factors that are capable of distinguishing interesting media data from non-interesting ones. By modelling both geometry preserving criterion and discrimination maximization criterion in a unified framework, M2L learns a common subspace for data from multiple views. The analytical solution of M2L is obtained by solving a generalized eigen-decomposition problem. Experiments on the Predicting Media Interestingness Dataset validate the effectiveness of the proposed method.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"403 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126679270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fotini Markatopoulou, Damianos Galanopoulos, V. Mezaris, I. Patras
{"title":"Query and Keyframe Representations for Ad-hoc Video Search","authors":"Fotini Markatopoulou, Damianos Galanopoulos, V. Mezaris, I. Patras","doi":"10.1145/3078971.3079041","DOIUrl":"https://doi.org/10.1145/3078971.3079041","url":null,"abstract":"This paper presents a fully-automatic method that combines video concept detection and textual query analysis in order to solve the problem of ad-hoc video search. We present a set of NLP steps that cleverly analyse different parts of the query in order to convert it to related semantic concepts, we propose a new method for transforming concept-based keyframe and query representations into a common semantic embedding space, and we show that our proposed combination of concept-based representations with their corresponding semantic embeddings results to improved video search accuracy. Our experiments in the TRECVID AVS 2016 and the Video Search 2008 datasets show the effectiveness of the proposed method compared to other similar approaches.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123277153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chrysa Collyda, Evlampios Apostolidis, Alexandros Pournaras, Fotini Markatopoulou, V. Mezaris, I. Patras
{"title":"VideoAnalysis4ALL: An On-line Tool for the Automatic Fragmentation and Concept-based Annotation, and the Interactive Exploration of Videos","authors":"Chrysa Collyda, Evlampios Apostolidis, Alexandros Pournaras, Fotini Markatopoulou, V. Mezaris, I. Patras","doi":"10.1145/3078971.3079015","DOIUrl":"https://doi.org/10.1145/3078971.3079015","url":null,"abstract":"This paper presents the VideoAnalysis4ALL tool that supports the automatic fragmentation and concept-based annotation of videos, and the exploration of the annotated video fragments through an interactive user interface. The developed web application decomposes the video into two different granularities, namely shots and scenes, and annotates each fragment by evaluating the existence of a number (several hundreds) of high-level visual concepts in the keyframes extracted from these fragments. Through the analysis the tool enables the identification and labeling of semantically coherent video fragments, while its user interfaces allow the discovery of these fragments with the help of human-interpretable concepts. The integrated state-of-the-art video analysis technologies perform very well and, by exploiting the processing capabilities of multi-thread / multi-core architectures, reduce the time required for analysis to approximately one third of the video's duration, thus making the analysis three times faster than real-time processing.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"348 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123326798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantin Pogorelov, M. Riegler, P. Halvorsen, C. Griwodz
{"title":"ClusterTag: Interactive Visualization, Clustering and Tagging Tool for Big Image Collections","authors":"Konstantin Pogorelov, M. Riegler, P. Halvorsen, C. Griwodz","doi":"10.1145/3078971.3079018","DOIUrl":"https://doi.org/10.1145/3078971.3079018","url":null,"abstract":"Exploring and annotating collections of images without meta-data is a complex task which requires convenient ways of presenting datasets to a user. Visual analytics and information visualization can help users by providing interfaces, and in this paper, we present an open source application that allows users from any domain to use feature-based clustering of large image collections to perform explorative browsing and annotation. For this, we use various image feature extraction mechanisms, different unsupervised clustering algorithms and hierarchical image collection visualization. The performance of the presented open source software allows users to process and display thousands of images at the same time by utilizing heterogeneous resources such as GPUs and different optimization techniques.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126313588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Oral Session 2: Multimedia Indexing (Oral presentations)","authors":"A. Ulges","doi":"10.1145/3254620","DOIUrl":"https://doi.org/10.1145/3254620","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115656085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}