Thomas Mensink, Thomas Jongstra, P. Mettes, Cees G. M. Snoek
{"title":"Music-Guided Video Summarization using Quadratic Assignments","authors":"Thomas Mensink, Thomas Jongstra, P. Mettes, Cees G. M. Snoek","doi":"10.1145/3078971.3079024","DOIUrl":"https://doi.org/10.1145/3078971.3079024","url":null,"abstract":"This paper aims to automatically generate a summary of an unedited video, guided by an externally provided music-track. The tempo, energy and beats in the music determine the choices and cuts in the video summarization. To solve this challenging task, we model video summarization as a quadratic assignment problem. We assign frames to the summary, using rewards based on frame interestingness, plot coherency, audio-visual match, and cut properties. Experimentally we validate our approach on the SumMe dataset. The results show that our music guided summaries are more appealing, and even outperform the current state-of-the-art summarization methods when evaluated on the F1 measure of precision and recall.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115317118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Industry Keynotes","authors":"Neil O'Hare","doi":"10.1145/3254630","DOIUrl":"https://doi.org/10.1145/3254630","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115540336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Multi-Modal Unified Sparse Representation Learning","authors":"Mridula Verma, K. K. Shukla","doi":"10.1145/3078971.3079040","DOIUrl":"https://doi.org/10.1145/3078971.3079040","url":null,"abstract":"Exploiting feature sets belonging to different modalities helps in improving a significant amount of accuracy for the task of recognition. Given representations of an object in different modalities (e.g. image, text, audio etc.), to learn a unified representation of the object, has been a popular problem in the literature of multimedia retrieval. In this paper, we introduce a new iterative algorithm that learns the sparse unified representation with better accuracy in a lesser number of iterations than the previously reported results. Our algorithm employs a new fixed-point iterative scheme along with an inertial step. In order to obtain more discriminative representation, we also imposed a regularization term that utilizes the label information from the datasets. Experimental results on two real benchmark datasets demonstrate the efficacy of our method in terms of the number of iterations and accuracy.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126857311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simple, Efficient and Effective Encodings of Local Deep Features for Video Action Recognition","authors":"Ionut Cosmin Duta, B. Ionescu, K. Aizawa, N. Sebe","doi":"10.1145/3078971.3078988","DOIUrl":"https://doi.org/10.1145/3078971.3078988","url":null,"abstract":"For an action recognition system a decisive component is represented by the feature encoding part which builds the final representation that serves as input to a classifier. One of the shortcomings of the existing encoding approaches is the fact that they are built around hand-crafted features and they are not also highly competitive on encoding the current deep features, necessary in many practical scenarios. In this work we propose two solutions specifically designed for encoding local deep features, taking advantage of the nature of deep networks, focusing on capturing the highest feature response of the convolutional maps. The proposed approaches for deep feature encoding provide a solution to encapsulate the features extracted with a convolutional neural network over the entire video. In terms of accuracy our encodings outperform by a large margin the current most widely used and powerful encoding approaches, while being extremely efficient for the computational cost. Evaluated in the context of action recognition tasks, our pipeline obtains state-of-the-art results on three challenging datasets: HMDB51, UCF50 and UCF101.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130225649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Oral Session 4: Multimedia Applications (Spotlight presentations)","authors":"W. Chu","doi":"10.1145/3254623","DOIUrl":"https://doi.org/10.1145/3254623","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132440069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bridging the Aesthetic Gap: The Wild Beauty of Web Imagery","authors":"Miriam Redi, Frank Z. Liu, Neil O'Hare","doi":"10.1145/3078971.3078972","DOIUrl":"https://doi.org/10.1145/3078971.3078972","url":null,"abstract":"To provide good results, image search engines need to rank not just the most relevant images, but also the highest quality images. To surface beautiful pictures, existing computational aesthetic models are trained with datasets from photo contest websites, dominated by professional photos. Such models fail completely in real web scenarios, where images are extremely diverse in terms of quality and type (e.g. drawings, clip-art, etc). This work aims at bridging and understanding this \"aesthetic gap\". We collect a dataset of around 100K web images with `quality' and `type' (photo vs non-photo) annotations. We design a set of visual features to describe image pictorial characteristics, and deeply analyse the peculiar beauty of web images as opposed to appealing professional images. Finally, we build a set of computational aesthetic frameworks based on deep learning and hand-crafted features that take into account the diverse quality of web images, and show that they significantly outperform traditional computational aesthetics methods on our dataset.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115628326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Lin, Olivier Morère, A. Veillard, Ling-yu Duan, Hanlin Goh, V. Chandrasekhar
{"title":"DeepHash for Image Instance Retrieval: Getting Regularization, Depth and Fine-Tuning Right","authors":"Jie Lin, Olivier Morère, A. Veillard, Ling-yu Duan, Hanlin Goh, V. Chandrasekhar","doi":"10.1145/3078971.3078983","DOIUrl":"https://doi.org/10.1145/3078971.3078983","url":null,"abstract":"This work focuses on representing very high-dimensional global image descriptors using very compact 64-1024 bit binary hashes for instance retrieval. We propose DeepHash: a hashing scheme based on deep networks. Key to making DeepHash work at extremely low bitrates are three important considerations -- regularization, depth and fine-tuning -- each requiring solutions specific to the hashing problem. In-depth evaluation shows that our scheme outperforms state-of-the-art methods over several benchmark datasets for both Fisher Vectors and Deep Convolutional Neural Network features, by up to 8.5% over other schemes. The retrieval performance with 256-bit hashes is close to that of the uncompressed floating point features -- a remarkable 512x compression.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124188197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Boididou, S. Papadopoulos, Lazaros Apostolidis, Y. Kompatsiaris
{"title":"Learning to Detect Misleading Content on Twitter","authors":"C. Boididou, S. Papadopoulos, Lazaros Apostolidis, Y. Kompatsiaris","doi":"10.1145/3078971.3078979","DOIUrl":"https://doi.org/10.1145/3078971.3078979","url":null,"abstract":"The publication and spread of misleading content is a problem of increasing magnitude, complexity and consequences in a world that is increasingly relying on user-generated content for news sourcing. To this end, multimedia analysis techniques could serve as an assisting tool to spot and debunk misleading content on the Web. In this paper, we tackle the problem of misleading multimedia content detection on Twitter in real time. We propose a number of new features and a new semi-supervised learning event adaptation approach that helps generalize the detection capabilities of trained models to unseen content, even when the event of interest is of different nature compared to that used for training. Combined with bagging, the proposed approach manages to outperform previous systems by a significant margin in terms of accuracy. Moreover, in order to communicate the verification process to end users, we develop a web-based application for visualizing the results.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124386603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Context-Aware Music Recommender Systems: Beyond the Pre-filtering Approach","authors":"M. Pichl, Eva Zangerle, Günther Specht","doi":"10.1145/3078971.3078980","DOIUrl":"https://doi.org/10.1145/3078971.3078980","url":null,"abstract":"Over the last years, music consumption has changed fundamentally: people switch from private, mostly limited music collections to huge public music collections provided by music streaming platforms. Thus, the amount of available music has increased dramatically and music streaming platforms heavily rely on recommender systems to assist users in discovering music they like. Incorporating the context of users has been shown to improve the quality of recommendations. Previous approaches based on pre-filtering suffered from a split dataset. In this work, we present a context-aware recommender system based on factorization machines that extracts information about the user's context from the names of the user's playlists. Based on a dataset comprising 15,000 users and 1.8 million tracks we show that our proposed approach outperforms the pre-filtering approach substantially in terms of accuracy of the computed recommendations.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124429588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kashif Ahmad, M. Riegler, Ans Riaz, N. Conci, Duc-Tien Dang-Nguyen, P. Halvorsen
{"title":"The JORD System: Linking Sky and Social Multimedia Data to Natural Disasters","authors":"Kashif Ahmad, M. Riegler, Ans Riaz, N. Conci, Duc-Tien Dang-Nguyen, P. Halvorsen","doi":"10.1145/3078971.3079013","DOIUrl":"https://doi.org/10.1145/3078971.3079013","url":null,"abstract":"Being able to automatically link social media information and data to remote-sensed data holds large possibilities for society and research. In this paper, we present a system called JORD that is able to autonomously collect social media data about technological and environmental disasters, and link it automatically to remote-sensed data. In addition, we demonstrate that queries in local languages that are relevant to the exact position of natural disasters retrieve more accurate information about a disaster event. To show the capabilities of the system, we present some examples of disaster events detected by the system. To evaluate the quality of the provided information and usefulness of JORD from the potential users point of view we include a crowdsourced user study.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124039434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}