MMCommons '15Pub Date : 2015-10-30DOI: 10.1145/2814815.2814818
D. Joshi, Matthew L. Cooper, Francine Chen, Yan-Ying Chen
{"title":"Building User Profiles from Shared Photos","authors":"D. Joshi, Matthew L. Cooper, Francine Chen, Yan-Ying Chen","doi":"10.1145/2814815.2814818","DOIUrl":"https://doi.org/10.1145/2814815.2814818","url":null,"abstract":"In this paper, we analyze the association between a social media user's photo content and their interests. Visual content of photos is analyzed using state-of-the-art deep learning based automatic concept recognition. We compute an aggregate visual concept signature for each user. User tags that have been manually applied to their photos are also used to construct a tf-idf based signature per user. We also obtain social groups that users join to represent their social interests. In an effort to compare the visual-based versus tag-based user profiles with social interests, we compare corresponding similarity matrices with a reference similarity matrix based on users' group memberships. A random baseline is also included that groups users by random sampling while preserving the actual group sizes. A difference metric is proposed and it is shown that the combination of visual and text features better approximates the group-based similarity matrix than either modality individually. We also validate the visual analysis against the reference inter-user similarity using the Spearman rank correlation coefficient. Finally we cluster users by their visual signatures and rank clusters using a cluster uniqueness criteria.","PeriodicalId":215083,"journal":{"name":"MMCommons '15","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116547704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MMCommons '15Pub Date : 2015-10-30DOI: 10.1145/2814815.2816986
Julia Bernd, Damian Borth, C. Carrano, Jaeyoung Choi, Benjamin Elizalde, G. Friedland, L. Gottlieb, Karl S. Ni, R. Pearce, Douglas N. Poland, Khalid Ashraf, David A. Shamma, B. Thomee
{"title":"Kickstarting the Commons: The YFCC100M and the YLI Corpora","authors":"Julia Bernd, Damian Borth, C. Carrano, Jaeyoung Choi, Benjamin Elizalde, G. Friedland, L. Gottlieb, Karl S. Ni, R. Pearce, Douglas N. Poland, Khalid Ashraf, David A. Shamma, B. Thomee","doi":"10.1145/2814815.2816986","DOIUrl":"https://doi.org/10.1145/2814815.2816986","url":null,"abstract":"The publication of the Yahoo Flickr Creative Commons 100 Million dataset (YFCC100M)--to date the largest open-access collection of photos and videos--has provided a unique opportunity to stimulate new research in multimedia analysis and retrieval. To make the YFCC100M even more valuable, we have started working towards supplementing it with a comprehensive set of precomputed features and high-quality ground truth annotations. As part of our efforts, we are releasing the YLI feature corpus, as well as the YLI-GEO and YLI-MED annotation subsets. Under the Multimedia Commons Project (MMCP), we are currently laying the groundwork for a common platform and framework around the YFCC100M that (i) facilitates researchers in contributing additional features and annotations, (ii) supports experimentation on the dataset, and (iii) enables sharing of obtained results. This paper describes the YLI features and annotations released thus far, and sketches our vision for the MMCP.","PeriodicalId":215083,"journal":{"name":"MMCommons '15","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121692820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MMCommons '15Pub Date : 2015-10-30DOI: 10.1145/2814815.2814817
A. Mathews, Lexing Xie, Xuming He
{"title":"Studying Object Naming with Online Photos and Caption","authors":"A. Mathews, Lexing Xie, Xuming He","doi":"10.1145/2814815.2814817","DOIUrl":"https://doi.org/10.1145/2814815.2814817","url":null,"abstract":"We explore what names people use to describe visual concepts and why these names are chosen. Choosing object names has been a topic of interest in cognitive psychology, but a systematic, data-driven approach for naming at the scale of thousands of objects does not yet exist. First, we find that visual context has interpretable effects on visual naming, by analysing the MSCOCO dataset that has manually annotated objects and captions containing the natural language names for the object. We show that taking into account other objects as context helps improve the prediction of object names. We then analyse the naming patterns on a large dataset from Flickr, using automatically detected concepts. Preliminary results indicate that naming patterns can be identified on a large scale, but contrary to the conventional wisdom in cognitive psychology, are not dominated by genus for animals. We further validate the automatic method with a pilot Amazon Mechanical Turk naming experiment, and explore the impact of automatic concept detectors with t-SNE visualizations.","PeriodicalId":215083,"journal":{"name":"MMCommons '15","volume":"28 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116594440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MMCommons '15Pub Date : 2015-10-30DOI: 10.1145/2814815.2814816
M. Ravanelli, Benjamin Elizalde, Julia Bernd, G. Friedland
{"title":"Insights into Audio-Based Multimedia Event Classification with Neural Networks","authors":"M. Ravanelli, Benjamin Elizalde, Julia Bernd, G. Friedland","doi":"10.1145/2814815.2814816","DOIUrl":"https://doi.org/10.1145/2814815.2814816","url":null,"abstract":"Multimedia Event Detection (MED) aims to identify events-also called scenes-in videos, such as a flash mob or a wedding ceremony. Audio content information complements cues such as visual content and text. In this paper, we explore the optimization of neural networks (NNs) for audio-based multimedia event classification, and discuss some insights towards more effectively using this paradigm for MED. We explore different architectures, in terms of number of layers and number of neurons. We also assess the performance impact of pre-training with Restricted Boltzmann Machines (RBMs) in contrast with random initialization, and explore the effect of varying the context window for the input to the NNs. Lastly, we compare the performance of Hidden Markov Models (HMMs) with a discriminative classifier for the event classification. We used the publicly available event-annotated YLI-MED dataset. Our results showed a performance improvement of more than 6% absolute accuracy compared to the latest results reported in the literature. Interestingly, these results were obtained with a single-layer neural network with random initialization, suggesting that standard approaches with deep learning and RBM pre-training are not fully adequate to address the high-level video event-classification task.","PeriodicalId":215083,"journal":{"name":"MMCommons '15","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132273061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MMCommons '15Pub Date : 2015-10-30DOI: 10.1145/2814815.2814820
Sebastian Kalkowski, Christian Schulze, A. Dengel, Damian Borth
{"title":"Real-time Analysis and Visualization of the YFCC100m Dataset","authors":"Sebastian Kalkowski, Christian Schulze, A. Dengel, Damian Borth","doi":"10.1145/2814815.2814820","DOIUrl":"https://doi.org/10.1145/2814815.2814820","url":null,"abstract":"With the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset, a novel dataset was introduced to the computer vision and multimedia research community. To maximize the benefit for the research community and utilize its potential, this dataset has to be made accessible by tools allowing to search for target concepts within the dataset and mechanism to browse images and videos of the dataset. Following best practice from data collections, such as ImageNet and MS COCO, this paper presents means of accessibility for the YFCC100m dataset. This includes a global analysis of the dataset and an online browser to explore and investigate subsets of the dataset in real-time. Providing statistics of the queried images and videos will enable researchers to refine their query successively, such that the users desired subset of interest can be narrowed down quickly. The final set of image and video can be downloaded as URLs from the browser for further processing.","PeriodicalId":215083,"journal":{"name":"MMCommons '15","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128609826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MMCommons '15Pub Date : 2015-10-30DOI: 10.1145/2814815.2814821
Hamid Izadinia, Bryan C. Russell, Ali Farhadi, M. Hoffman, Aaron Hertzmann
{"title":"Deep Classifiers from Image Tags in the Wild","authors":"Hamid Izadinia, Bryan C. Russell, Ali Farhadi, M. Hoffman, Aaron Hertzmann","doi":"10.1145/2814815.2814821","DOIUrl":"https://doi.org/10.1145/2814815.2814821","url":null,"abstract":"This paper proposes direct learning of image classification from image tags in the wild, without filtering. Each wild tag is supplied by the user who shared the image online. Enormous numbers of these tags are freely available, and they give insight about the image categories important to users and to image classification. Our main contribution is an analysis of the Flickr 100 Million Image dataset, including several useful observations about the statistics of these tags. We introduce a large-scale robust classification algorithm, in order to handle the inherent noise in these tags, and a calibration procedure to better predict objective annotations. We show that freely available, wild tag can obtain similar or superior results to large databases of costly manual annotations.","PeriodicalId":215083,"journal":{"name":"MMCommons '15","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129553908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MMCommons '15Pub Date : 2015-10-30DOI: 10.1145/2814815.2814819
Adrian Daniel Popescu, Eleftherios Spyromitros Xioufis, S. Papadopoulos, H. Borgne, Y. Kompatsiaris
{"title":"Toward an Automatic Evaluation of Retrieval Performance with Large Scale Image Collections","authors":"Adrian Daniel Popescu, Eleftherios Spyromitros Xioufis, S. Papadopoulos, H. Borgne, Y. Kompatsiaris","doi":"10.1145/2814815.2814819","DOIUrl":"https://doi.org/10.1145/2814815.2814819","url":null,"abstract":"The public availability of large-scale multimedia collections, such as YFCC, facilitates the evaluation of image retrieval approaches in real-life conditions. However, due to their size, the creation of exhaustive ground truth would require huge annotation effort, even for limited sets of queries. This paper investigates whether it is possible to estimate retrieval performance in absence of manually created ground truth data. Our hypothesis is that it is possible to leverage existing weak user annotations (tags) to automatically build ground truth data. To test our hypothesis, we implemented a large-scale retrieval pipeline based on two state-of-the-art image descriptors and two compressed versions of each. The top 50 results obtained with each configuration are manually annotated in order to estimate their performance. Alternately, we produce an automatic performance estimation that is based on pre-existing user tags. The automatic performance estimations exhibit strong positive correlation with the manual ones and the systems rankings obtained in the two evaluation settings are found to be similar. This indicates that, although incomplete and sometimes imprecision, weak user annotations can be effectively exploited to assess retrieval performance. As a by-product, we release state-of-the-art image features, as well as a reusable evaluation package that will encourage the use of YFCC in the community.","PeriodicalId":215083,"journal":{"name":"MMCommons '15","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116104131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}