Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval最新文献_第3页

Balanced Search Space Partitioning for Distributed Media Redundant Indexing 分布式媒体冗余索引均衡搜索空间分区

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078999

André Mourão, João Magalhães

{"title":"Balanced Search Space Partitioning for Distributed Media Redundant Indexing","authors":"André Mourão, João Magalhães","doi":"10.1145/3078971.3078999","DOIUrl":"https://doi.org/10.1145/3078971.3078999","url":null,"abstract":"This paper addresses the problem of balanced, redundant indexing of media information. Our goal is to partition and distribute the search index, taking advantage of the distributed systems properties: balanced load across nodes, redundancy on node down and efficient node usage under concurrent querying. We follow an information compression approach to solve this problem and propose to represent data with overcomplete codebooks, where each document is represented by only a few codewords and an indexing node is responsible for several codewords. Quantization algorithms are designed to fit the original data as best as possible, leading to bias towards codewords that fit the principal directions of data. In this paper, we propose the balanced KSVD (B-KSVD) algorithm, that distributes the allocation of data across a balanced number of codewords, according to the global distribution of data. Indexing experiments showed that B-KSVD can achieve 38% 1-recall by inspecting only 1% of the full index, distributed over 10 partitions. Traditional methods based on k-means need to either use larger codebooks or to inspect a larger portion of the index to achieve the same retrieval performance.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"581 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123209039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

LireSolr: A Visual Information Retrieval Server LireSolr:一个可视化信息检索服务器

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079014

M. Lux, M. Riegler, P. Halvorsen, Glenn Macstravic

引用次数: 0

Groups Re-identification with Temporal Context 时间背景下的群体再认同

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078978

Michal Koperski, Sławomir Bąk, Peter Carr

引用次数: 2

Session details: Special Oral Session: Identifying and Linking Interesting Content in Large Audiovisual Repositories 特别口头会议:识别和链接大型视听资源库中的有趣内容

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3254625

Maria Eskevich, R. Ordelman

引用次数: 0

Generating Video Descriptions with Topic Guidance 生成视频描述与主题指导

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079000

Shizhe Chen, Jia Chen, Qin Jin

{"title":"Generating Video Descriptions with Topic Guidance","authors":"Shizhe Chen, Jia Chen, Qin Jin","doi":"10.1145/3078971.3079000","DOIUrl":"https://doi.org/10.1145/3078971.3079000","url":null,"abstract":"Generating video descriptions in natural language (a.k.a. video captioning) is a more challenging task than image captioning as the videos are intrinsically more complicated than images in two aspects. First, videos cover a broader range of topics, such as news, music, sports and so on. Second, multiple topics could coexist in the same video. In this paper, we propose a novel caption model, topic-guided model (TGM), to generate topic-oriented descriptions for videos in the wild via exploiting topic information. In addition to predefined topics, i.e., category tags crawled from the web, we also mine topics in a data-driven way based on training captions by an unsupervised topic mining model. We show that data-driven topics reflect a better topic schema than the predefined topics. As for testing video topic prediction, we treat the topic mining model as teacher to train the student, the topic prediction model, by utilizing the full multi-modalities in the video especially the speech modality. We propose a series of caption models to exploit topic guidance, including implicitly using the topics as input features to generate words related to the topic and explicitly modifying the weights in the decoder with topics to function as an ensemble of topic-aware language decoders. Our comprehensive experimental results on the current largest video caption dataset MSR-VTT prove the effectiveness of our topic-guided model, which significantly surpasses the winning performance in the 2016 MSR video to language challenge.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133896700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

TaiChi: A Fine-Grained Action Recognition Dataset 太极:一个细粒度动作识别数据集

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079039

Shan Sun, Feng Wang, Qi Liang, Liang He

引用次数: 9

Cross-modal Image-Graphics Retrieval by Neural Transfer Learning 基于神经迁移学习的跨模态图像图形检索

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078994

Fabian Junkert, Markus Eberts, A. Ulges, Ulrich Schwanecke

引用次数: 1

Session details: Oral Session 5: Best Paper Candidates 会议细节:口头会议5:最佳论文候选人

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3254624

V. Mezaris

引用次数: 0

Transductive Visual-Semantic Embedding for Zero-shot Learning 零次学习的换向视觉语义嵌入

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078977

Xing Xu, Fumin Shen, Yang Yang, Jie Shao, Zi Huang

{"title":"Transductive Visual-Semantic Embedding for Zero-shot Learning","authors":"Xing Xu, Fumin Shen, Yang Yang, Jie Shao, Zi Huang","doi":"10.1145/3078971.3078977","DOIUrl":"https://doi.org/10.1145/3078971.3078977","url":null,"abstract":"Zero-shot learning (ZSL) aims to bridge the knowledge transfer via available semantic representations (e.g., attributes) between labeled source instances of seen classes and unlabelled target instances of unseen classes. Most existing ZSL approaches achieve this by learning a projection from the visual feature space to the semantic representation space based on the source instances, and directly applying it to the target instances. However, the intrinsic manifold structures residing in both semantic representations and visual features are not effectively incorporated into the learned projection function. Moreover, these methods may suffer from the inherent projection shift problem, due to the disjointness between seen and unseen classes. To overcome these drawbacks, we propose a novel framework termed transductive visual-semantic embedding (TVSE) for ZSL. In specific, TVSE first learns a latent embedding space to incorporate the manifold structures in both labeled source instances and unlabeled target instances under the transductive setting. In the learned space, each instance is viewed as a mixture of seen class scores. TVSE then effectively constructs the relational mapping between seen and unseen classes using the available semantic representations, and applies it to map the seen class scores of the target instances to their predictions of unseen classes. Extensive experiments on four benchmark datasets demonstrate that the proposed TVSE achieves competitive performance compared with the state-of-the-arts for zero-shot recognition and retrieval tasks.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123558516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Session details: Oral Session: Brave New Ideas 会议细节:口头会议:勇敢的新想法

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3254618

C. Ngo

引用次数: 0