{"title":"Balanced Search Space Partitioning for Distributed Media Redundant Indexing","authors":"André Mourão, João Magalhães","doi":"10.1145/3078971.3078999","DOIUrl":"https://doi.org/10.1145/3078971.3078999","url":null,"abstract":"This paper addresses the problem of balanced, redundant indexing of media information. Our goal is to partition and distribute the search index, taking advantage of the distributed systems properties: balanced load across nodes, redundancy on node down and efficient node usage under concurrent querying. We follow an information compression approach to solve this problem and propose to represent data with overcomplete codebooks, where each document is represented by only a few codewords and an indexing node is responsible for several codewords. Quantization algorithms are designed to fit the original data as best as possible, leading to bias towards codewords that fit the principal directions of data. In this paper, we propose the balanced KSVD (B-KSVD) algorithm, that distributes the allocation of data across a balanced number of codewords, according to the global distribution of data. Indexing experiments showed that B-KSVD can achieve 38% 1-recall by inspecting only 1% of the full index, distributed over 10 partitions. Traditional methods based on k-means need to either use larger codebooks or to inspect a larger portion of the index to achieve the same retrieval performance.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"581 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123209039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Lux, M. Riegler, P. Halvorsen, Glenn Macstravic
{"title":"LireSolr: A Visual Information Retrieval Server","authors":"M. Lux, M. Riegler, P. Halvorsen, Glenn Macstravic","doi":"10.1145/3078971.3079014","DOIUrl":"https://doi.org/10.1145/3078971.3079014","url":null,"abstract":"In this paper, we present LireSolr, an open source image retrieval server, build on top of the LIRE library and the Apache Solr search server. With LireSolr, visual information retrieval can be run on a server, which allows better distribution of workloads and simplifies applications in several areas including mobile and web. Furthermore, we showcase several example scenarios how LireSolr can be used to point out the broad range of possibilities and applications. The system is easy to install and setup, and the large number of retrieval tools either provided by LIRE or by other Apache Solr is made easily available on the search server. Moreover, our tool demonstrates how predictions from CNNs can easily be used to extend the visual information retrieval functionality.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126781728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Groups Re-identification with Temporal Context","authors":"Michal Koperski, Sławomir Bąk, Peter Carr","doi":"10.1145/3078971.3078978","DOIUrl":"https://doi.org/10.1145/3078971.3078978","url":null,"abstract":"Re-identification methods often require well aligned, unoccluded detections of an entire subject. Such assumptions are impractical in real world scenarios, where people tend to form groups. To circumvent poor detection performance caused by occlusions, we use fixed regions of interest and employ codebook-based visual representations. We account for illumination variations between cameras using a coupled clustering method that learns per-camera codebooks with entries that correspond across cameras. Because predictable movement patterns exist in many scenarios, we also incorporate temporal context to improve re-identification performance. This includes learning expected travel times directly from data and using mutual exclusion constraints to encourage solutions that maintain temporal ordering. Our experiments illustrate the merits of the proposed approach in challenging re-identification scenarios including crowded public spaces.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131181267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Special Oral Session: Identifying and Linking Interesting Content in Large Audiovisual Repositories","authors":"Maria Eskevich, R. Ordelman","doi":"10.1145/3254625","DOIUrl":"https://doi.org/10.1145/3254625","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134437905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating Video Descriptions with Topic Guidance","authors":"Shizhe Chen, Jia Chen, Qin Jin","doi":"10.1145/3078971.3079000","DOIUrl":"https://doi.org/10.1145/3078971.3079000","url":null,"abstract":"Generating video descriptions in natural language (a.k.a. video captioning) is a more challenging task than image captioning as the videos are intrinsically more complicated than images in two aspects. First, videos cover a broader range of topics, such as news, music, sports and so on. Second, multiple topics could coexist in the same video. In this paper, we propose a novel caption model, topic-guided model (TGM), to generate topic-oriented descriptions for videos in the wild via exploiting topic information. In addition to predefined topics, i.e., category tags crawled from the web, we also mine topics in a data-driven way based on training captions by an unsupervised topic mining model. We show that data-driven topics reflect a better topic schema than the predefined topics. As for testing video topic prediction, we treat the topic mining model as teacher to train the student, the topic prediction model, by utilizing the full multi-modalities in the video especially the speech modality. We propose a series of caption models to exploit topic guidance, including implicitly using the topics as input features to generate words related to the topic and explicitly modifying the weights in the decoder with topics to function as an ensemble of topic-aware language decoders. Our comprehensive experimental results on the current largest video caption dataset MSR-VTT prove the effectiveness of our topic-guided model, which significantly surpasses the winning performance in the 2016 MSR video to language challenge.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133896700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TaiChi: A Fine-Grained Action Recognition Dataset","authors":"Shan Sun, Feng Wang, Qi Liang, Liang He","doi":"10.1145/3078971.3079039","DOIUrl":"https://doi.org/10.1145/3078971.3079039","url":null,"abstract":"In this paper, we introduce TaiChi which is a fine-grained action dataset. It consists of unconstrained user-uploaded web videos containing camera motion and partial occlusions which pose new challenges to fine-grained action recognition compared to the existing datasets. In this dataset, 2,772 samples of 58 fine-grained action classes are manually annotated. Additionally, we provide the baseline action recognition results using the state-of-the-art Improved Dense Trajectory feature and Fisher Vector representation with an MAP (Mean Average Precision) of 51.39%.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128990103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabian Junkert, Markus Eberts, A. Ulges, Ulrich Schwanecke
{"title":"Cross-modal Image-Graphics Retrieval by Neural Transfer Learning","authors":"Fabian Junkert, Markus Eberts, A. Ulges, Ulrich Schwanecke","doi":"10.1145/3078971.3078994","DOIUrl":"https://doi.org/10.1145/3078971.3078994","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126191331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Oral Session 5: Best Paper Candidates","authors":"V. Mezaris","doi":"10.1145/3254624","DOIUrl":"https://doi.org/10.1145/3254624","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"363 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125821067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xing Xu, Fumin Shen, Yang Yang, Jie Shao, Zi Huang
{"title":"Transductive Visual-Semantic Embedding for Zero-shot Learning","authors":"Xing Xu, Fumin Shen, Yang Yang, Jie Shao, Zi Huang","doi":"10.1145/3078971.3078977","DOIUrl":"https://doi.org/10.1145/3078971.3078977","url":null,"abstract":"Zero-shot learning (ZSL) aims to bridge the knowledge transfer via available semantic representations (e.g., attributes) between labeled source instances of seen classes and unlabelled target instances of unseen classes. Most existing ZSL approaches achieve this by learning a projection from the visual feature space to the semantic representation space based on the source instances, and directly applying it to the target instances. However, the intrinsic manifold structures residing in both semantic representations and visual features are not effectively incorporated into the learned projection function. Moreover, these methods may suffer from the inherent projection shift problem, due to the disjointness between seen and unseen classes. To overcome these drawbacks, we propose a novel framework termed transductive visual-semantic embedding (TVSE) for ZSL. In specific, TVSE first learns a latent embedding space to incorporate the manifold structures in both labeled source instances and unlabeled target instances under the transductive setting. In the learned space, each instance is viewed as a mixture of seen class scores. TVSE then effectively constructs the relational mapping between seen and unseen classes using the available semantic representations, and applies it to map the seen class scores of the target instances to their predictions of unseen classes. Extensive experiments on four benchmark datasets demonstrate that the proposed TVSE achieves competitive performance compared with the state-of-the-arts for zero-shot recognition and retrieval tasks.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123558516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Oral Session: Brave New Ideas","authors":"C. Ngo","doi":"10.1145/3254618","DOIUrl":"https://doi.org/10.1145/3254618","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129506262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}