Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval最新文献

Visual Question Answering With a Hybrid Convolution Recurrent Model 基于混合卷积循环模型的视觉问答

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206054

Philipp Harzig, C. Eggert, R. Lienhart

引用次数: 3

Who to Ask: An Intelligent Fashion Consultant 问谁:一个聪明的时尚顾问

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206092

Yangbangyan Jiang, Qianqian Xu, Xiaochun Cao, Qingming Huang

引用次数: 5

Recognizing Actions in Wearable-Camera Videos by Training Classifiers on Fixed-Camera Videos 用固定摄像机视频训练分类器识别可穿戴摄像机视频中的动作

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206041

Yang Mi, Kang Zheng, Song Wang

{"title":"Recognizing Actions in Wearable-Camera Videos by Training Classifiers on Fixed-Camera Videos","authors":"Yang Mi, Kang Zheng, Song Wang","doi":"10.1145/3206025.3206041","DOIUrl":"https://doi.org/10.1145/3206025.3206041","url":null,"abstract":"Recognizing human actions in wearable camera videos, such as videos taken by GoPro or Google Glass, can benefit many multimedia applications. By mixing the complex and non-stop motion of the camera, motion features extracted from videos of the same action may show very large variation and inconsistency. It is very difficult to collect sufficient videos to cover all such variations and use them to train action classifiers with good generalization ability. In this paper, we develop a new approach to train action classifiers on a relatively smaller set of fixed-camera videos with different views, and then apply them to recognize actions in wearable-camera videos. In this approach, we temporally divide the input video into many shorter video segments and transform the motion features to stable ones in each video segment, in terms of a fixed view defined by an anchor frame in the segment. Finally, we use sparse coding to estimate the action likelihood in each segment, followed by combining the likelihoods from all the video segments for action recognition. We conduct experiments by training on a set of fixed-camera videos and testing on a set of wearable-camera videos, with very promising results.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125340326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Image Annotation Retrieval with Text-Domain Label Denoising 基于文本域标签去噪的图像标注检索

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206063

Zachary Seymour, Zhongfei Zhang

{"title":"Image Annotation Retrieval with Text-Domain Label Denoising","authors":"Zachary Seymour, Zhongfei Zhang","doi":"10.1145/3206025.3206063","DOIUrl":"https://doi.org/10.1145/3206025.3206063","url":null,"abstract":"This work explores the problem of making user-generated text data, in the form of noisy tags, usable for tasks such as automatic image annotation and image retrieval by denoising the data. Earlier work in this area has focused on filtering out noisy, sparse, or incorrect tags by representing an image by the accumulation of the tags of its nearest neighbors in the visual space. However, this imposes an expensive preprocessing step that must be performed for each new set of images and tags and relies on assumptions about the way the images have been labelled that we find do not always hold. We instead propose a technique for calculating a set of probabilities for the relevance of each tag for a given image relying soley on information in the text domain, namely through widely-available pretrained continous word embeddings. By first clustering the word embeddings for the tags, we calculate a set of weights representing the probability that each tag is meaningful to the image content. Given the set of tags denoised in this way, we use kernel canonical correlation analysis (KCCA) to learn a semantic space which we can project into to retrieve relevant tags for unseen images or to retrieve images for unseen tags. This work also explores the deficiencies of the use of continuous word embeddings for automatic image annotation in the existing KCCA literature and introduces a new method for constructing textual kernel matrices using these word vectors that improves tag retrieval results for both user-generated tags as well as expert labels.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115185771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Dense Dilated Network for Few Shot Action Recognition 基于密集扩张网络的少弹动作识别

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206028

Baohan Xu, Hao Ye, Yingbin Zheng, Heng Wang, Tianyu Luwang, Yu-Gang Jiang

引用次数: 33

Dynamic Construction and Manipulation of Hierarchical Quartic Image Graphs 分层四次图像图的动态构造与处理

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206093

N. Hezel, K. U. Barthel

引用次数: 6

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval 2018年ACM多媒体检索国际会议论文集

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025

引用次数: 4

VP-ReID: Vehicle and Person Re-Identification System VP-ReID:车辆和人员再识别系统

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206086

Longhui Wei, Xiaobin Liu, Jianing Li, Shiliang Zhang

引用次数: 18

3D Image-based Indoor Localization Joint With WiFi Positioning 基于3D图像的室内定位结合WiFi定位

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206070

G. Lu, Jingkuan Song

{"title":"3D Image-based Indoor Localization Joint With WiFi Positioning","authors":"G. Lu, Jingkuan Song","doi":"10.1145/3206025.3206070","DOIUrl":"https://doi.org/10.1145/3206025.3206070","url":null,"abstract":"We realize a system that utilizes WiFi to facilitate the image-based localization system, which avoids the confusion caused by the similar decoration inside the buildings. While WiFi-based localization thread obtains the rough location information, the image-based localization thread retrieves the best matching images and clusters the camera poses associated with the images into different location candidates. The image cluster closest to the WiFi localization outcome is selected for the exact camera pose estimation. The usage of WiFi significantly reduces the search scope, avoiding the extensive search of millions of descriptors in a 3D model. In the image-based localization stage, we also propose a novel 2D-to-2D-to-3D localization framework which follows a coarse-to-fine strategy to quickly locate the query image in several location candidates and performs the local feature matching and camera pose estimation after choosing the correct image location by WiFi positioning. The entire system demonstrates significant benefits in combining both images and WiFi signals in localization tasks and great potential to be deployed in real applications.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128067482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Binary Coding by Matrix Classifier for Efficient Subspace Retrieval 基于矩阵分类器的二值编码子空间检索

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206058

Lei Zhou, Xiao Bai, Xianglong Liu, Jun Zhou

{"title":"Binary Coding by Matrix Classifier for Efficient Subspace Retrieval","authors":"Lei Zhou, Xiao Bai, Xianglong Liu, Jun Zhou","doi":"10.1145/3206025.3206058","DOIUrl":"https://doi.org/10.1145/3206025.3206058","url":null,"abstract":"Fast retrieval in large-scale database with high-dimensional subspaces is an important task in many applications, such as image retrieval, video retrieval and visual recognition. This can be facilitated by approximate nearest subspace (ANS) retrieval which requires effective subspace representation. Most of the existing methods for this problem represent subspace by point in the Euclidean space or the Grassmannian space before applying the approximate nearest neighbor (ANN) search. However, the efficiency of these methods can not be guaranteed because the subspace representation step can be very time consuming when coping with high dimensional data. Moreover, the transforming process for subspace to point will cause subspace structural information loss which influence the retrieval accuracy. In this paper, we present a new approach for hashing-based ANS retrieval. The proposed method learns the binary codes for given subspace set following a similarity preserving criterion. It simultaneously leverages the learned binary codes to train matrix classifiers as hash functions. This method can directly binarize a subspace without transforming it into a vector. Therefore, it can efficiently solve the large-scale and high-dimensional multimedia data retrieval problem. Experiments on face recognition and video retrieval show that our method outperforms several state-of-the-art methods in both efficiency and accuracy.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125696546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10