Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval最新文献

筛选
英文 中文
Visual Question Answering With a Hybrid Convolution Recurrent Model 基于混合卷积循环模型的视觉问答
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206054
Philipp Harzig, C. Eggert, R. Lienhart
{"title":"Visual Question Answering With a Hybrid Convolution Recurrent Model","authors":"Philipp Harzig, C. Eggert, R. Lienhart","doi":"10.1145/3206025.3206054","DOIUrl":"https://doi.org/10.1145/3206025.3206054","url":null,"abstract":"Visual Question Answering (VQA) is a relatively new task, which tries to infer answer sentences for an input image coupled with a corresponding question. Instead of dynamically generating answers, they are usually inferred by finding the most probable answer from a fixed set of possible answers. Previous work did not address the problem of finding all possible answers, but only modeled the answering part of VQA as a classification task. To tackle this problem, we infer answer sentences by using a Long Short-Term Memory (LSTM) network that allows us to dynamically generate answers for (image, question) pairs. In a series of experiments, we discover an end-to-end Deep Neural Network structure, which allows us to dynamically answer questions referring to a given input image by using an LSTM decoder network. With this approach, we are able to generate both less common answers, which are not considered by classification models, and more complex answers with the appearance of datasets containing answers that consist of more than three words.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123285799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Who to Ask: An Intelligent Fashion Consultant 问谁:一个聪明的时尚顾问
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206092
Yangbangyan Jiang, Qianqian Xu, Xiaochun Cao, Qingming Huang
{"title":"Who to Ask: An Intelligent Fashion Consultant","authors":"Yangbangyan Jiang, Qianqian Xu, Xiaochun Cao, Qingming Huang","doi":"10.1145/3206025.3206092","DOIUrl":"https://doi.org/10.1145/3206025.3206092","url":null,"abstract":"Humankind has always been in pursuit of fashion. Nevertheless, people are often troubled by collocating clothes, e.g., tops, bottoms, shoes, and accessories, from numerous fashion items in their closets. Moreover, it may be expensive and inconvenient to employ a fashion stylist. In this paper, we present Stile, an end-to-end intelligent fashion consultant system, to generate stylish outfits for given items. Unlike previous systems, our framework considers the global compatibility of fashion items in the outfit and models the dependencies among items in a fixed order via a bidirectional LSTM. Therefore, it can guarantee that items in the same outfit should share a similar style and neither redundant nor missing items exist in the resulting outfit for essential categories. The demonstration shows that our proposed system provides people with a practical and convenient solution to find natural and proper fashion outfits.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123329139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Recognizing Actions in Wearable-Camera Videos by Training Classifiers on Fixed-Camera Videos 用固定摄像机视频训练分类器识别可穿戴摄像机视频中的动作
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206041
Yang Mi, Kang Zheng, Song Wang
{"title":"Recognizing Actions in Wearable-Camera Videos by Training Classifiers on Fixed-Camera Videos","authors":"Yang Mi, Kang Zheng, Song Wang","doi":"10.1145/3206025.3206041","DOIUrl":"https://doi.org/10.1145/3206025.3206041","url":null,"abstract":"Recognizing human actions in wearable camera videos, such as videos taken by GoPro or Google Glass, can benefit many multimedia applications. By mixing the complex and non-stop motion of the camera, motion features extracted from videos of the same action may show very large variation and inconsistency. It is very difficult to collect sufficient videos to cover all such variations and use them to train action classifiers with good generalization ability. In this paper, we develop a new approach to train action classifiers on a relatively smaller set of fixed-camera videos with different views, and then apply them to recognize actions in wearable-camera videos. In this approach, we temporally divide the input video into many shorter video segments and transform the motion features to stable ones in each video segment, in terms of a fixed view defined by an anchor frame in the segment. Finally, we use sparse coding to estimate the action likelihood in each segment, followed by combining the likelihoods from all the video segments for action recognition. We conduct experiments by training on a set of fixed-camera videos and testing on a set of wearable-camera videos, with very promising results.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125340326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Image Annotation Retrieval with Text-Domain Label Denoising 基于文本域标签去噪的图像标注检索
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206063
Zachary Seymour, Zhongfei Zhang
{"title":"Image Annotation Retrieval with Text-Domain Label Denoising","authors":"Zachary Seymour, Zhongfei Zhang","doi":"10.1145/3206025.3206063","DOIUrl":"https://doi.org/10.1145/3206025.3206063","url":null,"abstract":"This work explores the problem of making user-generated text data, in the form of noisy tags, usable for tasks such as automatic image annotation and image retrieval by denoising the data. Earlier work in this area has focused on filtering out noisy, sparse, or incorrect tags by representing an image by the accumulation of the tags of its nearest neighbors in the visual space. However, this imposes an expensive preprocessing step that must be performed for each new set of images and tags and relies on assumptions about the way the images have been labelled that we find do not always hold. We instead propose a technique for calculating a set of probabilities for the relevance of each tag for a given image relying soley on information in the text domain, namely through widely-available pretrained continous word embeddings. By first clustering the word embeddings for the tags, we calculate a set of weights representing the probability that each tag is meaningful to the image content. Given the set of tags denoised in this way, we use kernel canonical correlation analysis (KCCA) to learn a semantic space which we can project into to retrieve relevant tags for unseen images or to retrieve images for unseen tags. This work also explores the deficiencies of the use of continuous word embeddings for automatic image annotation in the existing KCCA literature and introduces a new method for constructing textual kernel matrices using these word vectors that improves tag retrieval results for both user-generated tags as well as expert labels.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115185771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dense Dilated Network for Few Shot Action Recognition 基于密集扩张网络的少弹动作识别
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206028
Baohan Xu, Hao Ye, Yingbin Zheng, Heng Wang, Tianyu Luwang, Yu-Gang Jiang
{"title":"Dense Dilated Network for Few Shot Action Recognition","authors":"Baohan Xu, Hao Ye, Yingbin Zheng, Heng Wang, Tianyu Luwang, Yu-Gang Jiang","doi":"10.1145/3206025.3206028","DOIUrl":"https://doi.org/10.1145/3206025.3206028","url":null,"abstract":"Recently, video action recognition has been widely studied. Training deep neural networks requires a large amount of well-labeled videos. On the other hand, videos in the same class share high-level semantic similarity. In this paper, we introduce a novel neural network architecture to simultaneously capture local and long-term spatial temporal information. The dilated dense network is proposed with the blocks being composed of densely-connected dilated convolutions layers. The proposed framework is capable of fusing each layer's outputs to learn high-level representations, and the representations are robust even with only few training snippets. The aggregations of dilated dense blocks are also explored. We conduct extensive experiments on UCF101 and demonstrate the effectiveness of our proposed method, especially with few training examples.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115198630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Dynamic Construction and Manipulation of Hierarchical Quartic Image Graphs 分层四次图像图的动态构造与处理
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206093
N. Hezel, K. U. Barthel
{"title":"Dynamic Construction and Manipulation of Hierarchical Quartic Image Graphs","authors":"N. Hezel, K. U. Barthel","doi":"10.1145/3206025.3206093","DOIUrl":"https://doi.org/10.1145/3206025.3206093","url":null,"abstract":"Over the last years, we have published papers about intuitive image graph navigation and showed how to build static hierarchical image graphs efficiently. In this paper, we showcase new results and present techniques to dynamically construct and manipulate these kinds of graphs. They connect similar images and perform well in retrieving tasks regardless of the number of nodes. By applying an improved fast self-sorting map algorithm, entire image collections (structured in a graph) can be explored with a user interface resembling common navigation services.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122165251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval 2018年ACM多媒体检索国际会议论文集
{"title":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","authors":"","doi":"10.1145/3206025","DOIUrl":"https://doi.org/10.1145/3206025","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124071578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
VP-ReID: Vehicle and Person Re-Identification System VP-ReID:车辆和人员再识别系统
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206086
Longhui Wei, Xiaobin Liu, Jianing Li, Shiliang Zhang
{"title":"VP-ReID: Vehicle and Person Re-Identification System","authors":"Longhui Wei, Xiaobin Liu, Jianing Li, Shiliang Zhang","doi":"10.1145/3206025.3206086","DOIUrl":"https://doi.org/10.1145/3206025.3206086","url":null,"abstract":"With the capability of locating and tracking specific suspects or vehicles in a large camera network, person Re-Identification (ReID) and vehicle ReID show potential to be a key technology in smart surveillance system. They have been drawing lots of attentions from both academia and industry. To demonstrate our recent research progresses on those two tasks, we develop a robust and efficient person and video ReID system named as VP-ReID. This system is build based on our recent works including Deep Convolutional Neural Network design for discriminative feature extraction, efficient off-line indexing, as well as distance metric optimization for deep feature learning. Constructed upon those algorithms, VP-ReID identifies query vehicle and person efficiently and accurately from a large gallery set.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127885292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
3D Image-based Indoor Localization Joint With WiFi Positioning 基于3D图像的室内定位结合WiFi定位
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206070
G. Lu, Jingkuan Song
{"title":"3D Image-based Indoor Localization Joint With WiFi Positioning","authors":"G. Lu, Jingkuan Song","doi":"10.1145/3206025.3206070","DOIUrl":"https://doi.org/10.1145/3206025.3206070","url":null,"abstract":"We realize a system that utilizes WiFi to facilitate the image-based localization system, which avoids the confusion caused by the similar decoration inside the buildings. While WiFi-based localization thread obtains the rough location information, the image-based localization thread retrieves the best matching images and clusters the camera poses associated with the images into different location candidates. The image cluster closest to the WiFi localization outcome is selected for the exact camera pose estimation. The usage of WiFi significantly reduces the search scope, avoiding the extensive search of millions of descriptors in a 3D model. In the image-based localization stage, we also propose a novel 2D-to-2D-to-3D localization framework which follows a coarse-to-fine strategy to quickly locate the query image in several location candidates and performs the local feature matching and camera pose estimation after choosing the correct image location by WiFi positioning. The entire system demonstrates significant benefits in combining both images and WiFi signals in localization tasks and great potential to be deployed in real applications.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128067482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Binary Coding by Matrix Classifier for Efficient Subspace Retrieval 基于矩阵分类器的二值编码子空间检索
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206058
Lei Zhou, Xiao Bai, Xianglong Liu, Jun Zhou
{"title":"Binary Coding by Matrix Classifier for Efficient Subspace Retrieval","authors":"Lei Zhou, Xiao Bai, Xianglong Liu, Jun Zhou","doi":"10.1145/3206025.3206058","DOIUrl":"https://doi.org/10.1145/3206025.3206058","url":null,"abstract":"Fast retrieval in large-scale database with high-dimensional subspaces is an important task in many applications, such as image retrieval, video retrieval and visual recognition. This can be facilitated by approximate nearest subspace (ANS) retrieval which requires effective subspace representation. Most of the existing methods for this problem represent subspace by point in the Euclidean space or the Grassmannian space before applying the approximate nearest neighbor (ANN) search. However, the efficiency of these methods can not be guaranteed because the subspace representation step can be very time consuming when coping with high dimensional data. Moreover, the transforming process for subspace to point will cause subspace structural information loss which influence the retrieval accuracy. In this paper, we present a new approach for hashing-based ANS retrieval. The proposed method learns the binary codes for given subspace set following a similarity preserving criterion. It simultaneously leverages the learned binary codes to train matrix classifiers as hash functions. This method can directly binarize a subspace without transforming it into a vector. Therefore, it can efficiently solve the large-scale and high-dimensional multimedia data retrieval problem. Experiments on face recognition and video retrieval show that our method outperforms several state-of-the-art methods in both efficiency and accuracy.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125696546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信