Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval最新文献

筛选
英文 中文
An Entropy Model for Loiterer Retrieval across Multiple Surveillance Cameras 基于熵模型的多摄像机游荡者检索
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206049
Maguell L. T. L. Sandifort, Jianquan Liu, Shoji Nishimura, Wolfgang Hürst
{"title":"An Entropy Model for Loiterer Retrieval across Multiple Surveillance Cameras","authors":"Maguell L. T. L. Sandifort, Jianquan Liu, Shoji Nishimura, Wolfgang Hürst","doi":"10.1145/3206025.3206049","DOIUrl":"https://doi.org/10.1145/3206025.3206049","url":null,"abstract":"Loitering is a suspicious behavior that often leads to criminal actions, such as pickpocketing and illegal entry. Tracking methods can determine suspicious behavior based on trajectory, but require continuous appearance and are difficult to scale up to multi-camera systems. Using the duration of appearance of features works on multiple cameras, but does not consider major aspects of loitering behavior, such as repeated appearance and trajectory of candidates. We introduce an entropy model that maps the location of a person's features on a heatmap. It can be used as an abstraction of trajectory tracking across multiple surveillance cameras. We evaluate our method over several datasets and compare it to other loitering detection methods. The results show that our approach has similar results to state of the art, but can provide additional interesting candidates.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122366392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries 基于语言模式和跨模态的复杂查询图像检索
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206050
Chandramani Chaudhary, Poonam Goyal, Joel Ruben Antony Moniz, Navneet Goyal, Yi-Ping Phoebe Chen
{"title":"Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries","authors":"Chandramani Chaudhary, Poonam Goyal, Joel Ruben Antony Moniz, Navneet Goyal, Yi-Ping Phoebe Chen","doi":"10.1145/3206025.3206050","DOIUrl":"https://doi.org/10.1145/3206025.3206050","url":null,"abstract":"With the rising prevalence of social media, coupled with the ease of sharing images, people with specific needs and applications such as known item search, multimedia question answering, etc., have started searching for visual content, which is expressed in terms of complex queries. A complex query consists of multiple concepts and their attributes are arranged to convey semantics. It is less effective to answer such queries by simply appending the search results gathered from individual or subsets of concepts present in the query. In this paper, we propose to exploit the query constituents and relationships among them. The proposed approach finds image-query relevance by integrating three models - the linguistic pattern-based textual model, the visual model, and the cross modality model. We extract linguistic patterns from complex queries, gather their related crawled images, and assign relevance scores to images in the corpus. The relevance scores are then used to rank the images. We experiment on more than 140k images and compare the NDCG@n scores with the state-of-the-art image ranking methods for complex queries. Also, ranking of images obtained by our approach outperforms than that of obtained by a popular search engine.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124246757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Asymmetric Discrete Cross-Modal Hashing 非对称离散跨模态哈希
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206034
Xin Luo, P. Zhang, Ye Wu, Zhen-Duo Chen, Hua-Junjie Huang, Xin-Shun Xu
{"title":"Asymmetric Discrete Cross-Modal Hashing","authors":"Xin Luo, P. Zhang, Ye Wu, Zhen-Duo Chen, Hua-Junjie Huang, Xin-Shun Xu","doi":"10.1145/3206025.3206034","DOIUrl":"https://doi.org/10.1145/3206025.3206034","url":null,"abstract":"Recently, cross-modal hashing (CMH) methods have attracted much attention. Many methods have been explored; however, there are still some issues that need to be further considered. 1) How to efficiently construct the correlations among heterogeneous modalities. 2) How to solve the NP-hard optimization problem and avoid the large quantization errors generated by relaxation. 3) How to handle the complex and difficult problem in most CMH methods that simultaneously learning the hash codes and hash functions. To address these challenges, we present a novel cross-modal hashing algorithm, named Asymmetric Discrete Cross-Modal Hashing (ADCH). Specifically, it leverages the collective matrix factorization technique to learn the common latent representations while preserving not only the cross-correlation from different modalities but also the semantic similarity. Instead of relaxing the binary constraints, it generates the hash codes directly using an iterative optimization algorithm proposed in this work. Based the learnt hash codes, ADCH further learns a series of binary classifiers as hash functions, which is flexible and effective. Extensive experiments are conducted on three real-world datasets. The results demonstrate that ADCH outperforms several state-of-the-art cross-modal hashing baselines.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130447397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Session details: Keynote 2 会议详情:主题演讲2
S. Satoh
{"title":"Session details: Keynote 2","authors":"S. Satoh","doi":"10.1145/3252923","DOIUrl":"https://doi.org/10.1145/3252923","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114971348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The PMEmo Dataset for Music Emotion Recognition 音乐情感识别的PMEmo数据集
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206037
Ke-jun Zhang, Hui Zhang, Simeng Li, Chang-yuan Yang, Lingyun Sun
{"title":"The PMEmo Dataset for Music Emotion Recognition","authors":"Ke-jun Zhang, Hui Zhang, Simeng Li, Chang-yuan Yang, Lingyun Sun","doi":"10.1145/3206025.3206037","DOIUrl":"https://doi.org/10.1145/3206025.3206037","url":null,"abstract":"Music Emotion Recognition (MER) has recently received considerable attention. To support the MER research which requires large music content libraries, we present the PMEmo dataset containing emotion annotations of 794 songs as well as the simultaneous electrodermal activity (EDA) signals. A Music Emotion Experiment was well-designed for collecting the affective-annotated music corpus of high quality, which recruited 457 subjects. The dataset is publically available to the research community, which is foremost intended for benchmarking in music emotion retrieval and recognition. To straightforwardly evaluate the methodologies for music affective analysis, it also involves pre-computed audio feature sets. In addition to that, manually selected chorus excerpts (compressed in MP3) of songs are provided to facilitate the development of chorus-related research. In this article, We describe in detail the resource acquisition, subject selection, experiment design and annotation collection procedures, as well as the dataset content and data reliability analysis. We also illustrate its usage in some simple music emotion recognition tasks which testified the PMEmo dataset's competence for the MER work. Compared to other homogeneous datasets, PMEmo is novel in the organization and management of the recruited annotators, and it is also characterized by its large amount of music with simultaneous physiological signals.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"1788 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129602222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Multi-label Triplet Embeddings for Image Annotation from User-Generated Tags 基于用户生成标签的图像标注多标签三重嵌入
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206061
Zachary Seymour, Zhongfei Zhang
{"title":"Multi-label Triplet Embeddings for Image Annotation from User-Generated Tags","authors":"Zachary Seymour, Zhongfei Zhang","doi":"10.1145/3206025.3206061","DOIUrl":"https://doi.org/10.1145/3206025.3206061","url":null,"abstract":"This work studies the representational embedding of images and their corresponding annotations--in the form of tag metadata--such that, given a piece of the raw data in one modality, the corresponding semantic description can be retrieved in terms of the raw data in another. While convolutional neural networks (CNNs) have been widely and successfully applied in this domain with regards to detecting semantically simple scenes or categories (even though many such objects may be simultaneously present in an image), this work approaches the task of dealing with image annotations in the context of noisy, user-generated, and semantically complex multi-labels, widely available from social media sites. In this case, the labels for an image are diverse, noisy, and often not specifically related to an object, but rather descriptive or user-specific. Furthermore, the existing deep image annotation literature using this type of data typically utilizes the so-called CNN-RNN framework, combining convolutional and recurrent neural networks. We offer a discussion of why RNNs may not be the best choice in this case, though they have been shown to perform well on the similar captioning tasks. Our model exploits the latent image-text space through the use of a triplet loss framework to learn a joint embedding space for the images and their tags, in the presence of multiple, potentially positive exemplar classes. We present state-of-the-art results of the representational properties of these embeddings on several image annotation datasets to show the promise of this approach.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129908514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Searching and Matching Texture-free 3D Shapes in Images 搜索和匹配图像中无纹理的3D形状
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206057
Shuai Liao, E. Gavves, Cees G. M. Snoek
{"title":"Searching and Matching Texture-free 3D Shapes in Images","authors":"Shuai Liao, E. Gavves, Cees G. M. Snoek","doi":"10.1145/3206025.3206057","DOIUrl":"https://doi.org/10.1145/3206025.3206057","url":null,"abstract":"The goal of this paper is to search and match the best rendered view of a texture-free 3D shape to an object of interest in a 2D query image. Matching rendered views of 3D shapes to RGB images is challenging because, 1) 3D shapes are not always a perfect match for the image queries, 2) there is great domain difference between rendered and RGB images, and 3) estimating the object scale versus distance is inherently ambiguous in images from uncalibrated cameras. In this work we propose a deeply learned matching function that attacks these challenges and can be used for a search engine that finds the appropriate 3D shape and matches it to objects in 2D query images. We evaluate the proposed matching function and search engine with a series of controlled experiments on the 24 most populated vehicle categories in PASCAL3D+. We test the capability of the learned matching function in transferring to unseen 3D shapes and study overall search engine sensitivity w.r.t available 3D shapes and object localization accuracy, showing promising results in retrieving 3D shapes given 2D image queries.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131074766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Keynote 1 会议详情:主题演讲1
K. Aizawa
{"title":"Session details: Keynote 1","authors":"K. Aizawa","doi":"10.1145/3252922","DOIUrl":"https://doi.org/10.1145/3252922","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"51 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127997097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orion 猎户座
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3210491
Yusuke Fujisaka
{"title":"Orion","authors":"Yusuke Fujisaka","doi":"10.1145/3206025.3210491","DOIUrl":"https://doi.org/10.1145/3206025.3210491","url":null,"abstract":"Social Networking Services (SNS) depend on user-generated content (UGC). A fraction of UGC is considered spam, such as adult, scam and abusive content. In order to maintain service reliability and avoid criminal activity, content moderation is employed to eliminate spam from SNS. Content moderation consists of manual content-monitoring operations and/or automatic spam-filtering. Detecting a small portion of spam among a large amount of UGC mostly relies on manual operation, thus it requires a large number of human operators and sometimes suffers from human error. In contrast, automatic spam-filtering can be processed with smaller cost, however it is difficult to follow spams' continuously changing trend, and it may declines service experience due to false positives. This presentation introduces an integrated content moderation platform called \"Orion'', which aims to minimize manual process and maximize detection of spam in UGC data. Orion preserves post history by users and services, which enables calculating the risk level of each user and decide whether monitoring is required. Also, Orion has a scalable API that can perform number of machine-learning based filtering processes, such as DNN (Deep Neural Network) and SVM for text and images that are posted in many SNS systems. We show that Orion improves efficiency of content moderation compared to a fully manual operation.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115773574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips 基于特征选择和多模态融合的电影片段情感估计
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206074
Yasemin Timar, Nihan Karslioglu, Heysem Kaya, A. A. Salah
{"title":"Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips","authors":"Yasemin Timar, Nihan Karslioglu, Heysem Kaya, A. A. Salah","doi":"10.1145/3206025.3206074","DOIUrl":"https://doi.org/10.1145/3206025.3206074","url":null,"abstract":"Perceptual understanding of media content has many applications, including content-based retrieval, marketing, content optimization, psychological assessment, and affect-based learning. In this paper, we model audio visual features extracted from videos via machine learning approaches to estimate the affective responses of the viewers. We use the LIRIS-ACCEDE dataset and the MediaEval 2017 Challenge setting to evaluate the proposed methods. This dataset is composed of movies of professional or amateur origin, annotated with viewers' arousal, valence, and fear scores. We extract a number of audio features, such as Mel-frequency Cepstral Coefficients, and visual features, such as dense SIFT, hue-saturation histogram, and features from a deep neural network trained for object recognition. We contrast two different approaches in the paper, and report experiments with different fusion and smoothing strategies. We demonstrate the benefit of feature selection and multimodal fusion on estimating affective responses to movie segments.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131534937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信