Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval最新文献_第4页

An Entropy Model for Loiterer Retrieval across Multiple Surveillance Cameras 基于熵模型的多摄像机游荡者检索

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206049

Maguell L. T. L. Sandifort, Jianquan Liu, Shoji Nishimura, Wolfgang Hürst

引用次数: 10

Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries 基于语言模式和跨模态的复杂查询图像检索

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206050

Chandramani Chaudhary, Poonam Goyal, Joel Ruben Antony Moniz, Navneet Goyal, Yi-Ping Phoebe Chen

{"title":"Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries","authors":"Chandramani Chaudhary, Poonam Goyal, Joel Ruben Antony Moniz, Navneet Goyal, Yi-Ping Phoebe Chen","doi":"10.1145/3206025.3206050","DOIUrl":"https://doi.org/10.1145/3206025.3206050","url":null,"abstract":"With the rising prevalence of social media, coupled with the ease of sharing images, people with specific needs and applications such as known item search, multimedia question answering, etc., have started searching for visual content, which is expressed in terms of complex queries. A complex query consists of multiple concepts and their attributes are arranged to convey semantics. It is less effective to answer such queries by simply appending the search results gathered from individual or subsets of concepts present in the query. In this paper, we propose to exploit the query constituents and relationships among them. The proposed approach finds image-query relevance by integrating three models - the linguistic pattern-based textual model, the visual model, and the cross modality model. We extract linguistic patterns from complex queries, gather their related crawled images, and assign relevance scores to images in the corpus. The relevance scores are then used to rank the images. We experiment on more than 140k images and compare the NDCG@n scores with the state-of-the-art image ranking methods for complex queries. Also, ranking of images obtained by our approach outperforms than that of obtained by a popular search engine.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124246757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Asymmetric Discrete Cross-Modal Hashing 非对称离散跨模态哈希

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206034

Xin Luo, P. Zhang, Ye Wu, Zhen-Duo Chen, Hua-Junjie Huang, Xin-Shun Xu

{"title":"Asymmetric Discrete Cross-Modal Hashing","authors":"Xin Luo, P. Zhang, Ye Wu, Zhen-Duo Chen, Hua-Junjie Huang, Xin-Shun Xu","doi":"10.1145/3206025.3206034","DOIUrl":"https://doi.org/10.1145/3206025.3206034","url":null,"abstract":"Recently, cross-modal hashing (CMH) methods have attracted much attention. Many methods have been explored; however, there are still some issues that need to be further considered. 1) How to efficiently construct the correlations among heterogeneous modalities. 2) How to solve the NP-hard optimization problem and avoid the large quantization errors generated by relaxation. 3) How to handle the complex and difficult problem in most CMH methods that simultaneously learning the hash codes and hash functions. To address these challenges, we present a novel cross-modal hashing algorithm, named Asymmetric Discrete Cross-Modal Hashing (ADCH). Specifically, it leverages the collective matrix factorization technique to learn the common latent representations while preserving not only the cross-correlation from different modalities but also the semantic similarity. Instead of relaxing the binary constraints, it generates the hash codes directly using an iterative optimization algorithm proposed in this work. Based the learnt hash codes, ADCH further learns a series of binary classifiers as hash functions, which is flexible and effective. Extensive experiments are conducted on three real-world datasets. The results demonstrate that ADCH outperforms several state-of-the-art cross-modal hashing baselines.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130447397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Session details: Keynote 2 会议详情:主题演讲2

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3252923

S. Satoh

引用次数: 0

The PMEmo Dataset for Music Emotion Recognition 音乐情感识别的PMEmo数据集

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206037

Ke-jun Zhang, Hui Zhang, Simeng Li, Chang-yuan Yang, Lingyun Sun

{"title":"The PMEmo Dataset for Music Emotion Recognition","authors":"Ke-jun Zhang, Hui Zhang, Simeng Li, Chang-yuan Yang, Lingyun Sun","doi":"10.1145/3206025.3206037","DOIUrl":"https://doi.org/10.1145/3206025.3206037","url":null,"abstract":"Music Emotion Recognition (MER) has recently received considerable attention. To support the MER research which requires large music content libraries, we present the PMEmo dataset containing emotion annotations of 794 songs as well as the simultaneous electrodermal activity (EDA) signals. A Music Emotion Experiment was well-designed for collecting the affective-annotated music corpus of high quality, which recruited 457 subjects. The dataset is publically available to the research community, which is foremost intended for benchmarking in music emotion retrieval and recognition. To straightforwardly evaluate the methodologies for music affective analysis, it also involves pre-computed audio feature sets. In addition to that, manually selected chorus excerpts (compressed in MP3) of songs are provided to facilitate the development of chorus-related research. In this article, We describe in detail the resource acquisition, subject selection, experiment design and annotation collection procedures, as well as the dataset content and data reliability analysis. We also illustrate its usage in some simple music emotion recognition tasks which testified the PMEmo dataset's competence for the MER work. Compared to other homogeneous datasets, PMEmo is novel in the organization and management of the recruited annotators, and it is also characterized by its large amount of music with simultaneous physiological signals.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"1788 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129602222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

Multi-label Triplet Embeddings for Image Annotation from User-Generated Tags 基于用户生成标签的图像标注多标签三重嵌入

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206061

Zachary Seymour, Zhongfei Zhang

{"title":"Multi-label Triplet Embeddings for Image Annotation from User-Generated Tags","authors":"Zachary Seymour, Zhongfei Zhang","doi":"10.1145/3206025.3206061","DOIUrl":"https://doi.org/10.1145/3206025.3206061","url":null,"abstract":"This work studies the representational embedding of images and their corresponding annotations--in the form of tag metadata--such that, given a piece of the raw data in one modality, the corresponding semantic description can be retrieved in terms of the raw data in another. While convolutional neural networks (CNNs) have been widely and successfully applied in this domain with regards to detecting semantically simple scenes or categories (even though many such objects may be simultaneously present in an image), this work approaches the task of dealing with image annotations in the context of noisy, user-generated, and semantically complex multi-labels, widely available from social media sites. In this case, the labels for an image are diverse, noisy, and often not specifically related to an object, but rather descriptive or user-specific. Furthermore, the existing deep image annotation literature using this type of data typically utilizes the so-called CNN-RNN framework, combining convolutional and recurrent neural networks. We offer a discussion of why RNNs may not be the best choice in this case, though they have been shown to perform well on the similar captioning tasks. Our model exploits the latent image-text space through the use of a triplet loss framework to learn a joint embedding space for the images and their tags, in the presence of multiple, potentially positive exemplar classes. We present state-of-the-art results of the representational properties of these embeddings on several image annotation datasets to show the promise of this approach.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129908514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Searching and Matching Texture-free 3D Shapes in Images 搜索和匹配图像中无纹理的3D形状

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206057

Shuai Liao, E. Gavves, Cees G. M. Snoek

引用次数: 0

Session details: Keynote 1 会议详情:主题演讲1

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3252922

K. Aizawa

引用次数: 0

Orion 猎户座

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3210491

Yusuke Fujisaka

{"title":"Orion","authors":"Yusuke Fujisaka","doi":"10.1145/3206025.3210491","DOIUrl":"https://doi.org/10.1145/3206025.3210491","url":null,"abstract":"Social Networking Services (SNS) depend on user-generated content (UGC). A fraction of UGC is considered spam, such as adult, scam and abusive content. In order to maintain service reliability and avoid criminal activity, content moderation is employed to eliminate spam from SNS. Content moderation consists of manual content-monitoring operations and/or automatic spam-filtering. Detecting a small portion of spam among a large amount of UGC mostly relies on manual operation, thus it requires a large number of human operators and sometimes suffers from human error. In contrast, automatic spam-filtering can be processed with smaller cost, however it is difficult to follow spams' continuously changing trend, and it may declines service experience due to false positives. This presentation introduces an integrated content moderation platform called \"Orion'', which aims to minimize manual process and maximize detection of spam in UGC data. Orion preserves post history by users and services, which enables calculating the risk level of each user and decide whether monitoring is required. Also, Orion has a scalable API that can perform number of machine-learning based filtering processes, such as DNN (Deep Neural Network) and SVM for text and images that are posted in many SNS systems. We show that Orion improves efficiency of content moderation compared to a fully manual operation.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115773574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips 基于特征选择和多模态融合的电影片段情感估计

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206074

Yasemin Timar, Nihan Karslioglu, Heysem Kaya, A. A. Salah

引用次数: 2