Yashar Deldjoo, P. Cremonesi, M. Schedl, Massimo Quadrana
{"title":"The effect of different video summarization models on the quality of video recommendation based on low-level visual features","authors":"Yashar Deldjoo, P. Cremonesi, M. Schedl, Massimo Quadrana","doi":"10.1145/3095713.3095734","DOIUrl":"https://doi.org/10.1145/3095713.3095734","url":null,"abstract":"Video summarization is a powerful tool for video understanding and browsing and is considered as an enabler for many video analysis tasks. While the effect of video summarization models has been largely studied in video retrieval and indexing applications over the last decade, its impact has not been well investigated in content-based video recommendation systems (RSs) based on low-level visual features, where the goal is to recommend items/videos to users based on visual content of videos. This work reveals specific problems related to video summarization and their impact on video recommendation. We present preliminary results of an analysis involving applying different video summarization models for the problem of video recommendation on a real-world RS dataset (MovieLens-10M) and show how temporal feature aggregation and video segmentation granularity can significantly influence/improve the quality of recommendation.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125929439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. B. Fonseca, I. Freire, Zenilton K. G. Patrocínio, S. Guimarães, G. Sargent, R. Sicre, G. Gravier
{"title":"Tag Propagation Approaches within Speaking Face Graphs for Multimodal Person Discovery","authors":"G. B. Fonseca, I. Freire, Zenilton K. G. Patrocínio, S. Guimarães, G. Sargent, R. Sicre, G. Gravier","doi":"10.1145/3095713.3095729","DOIUrl":"https://doi.org/10.1145/3095713.3095729","url":null,"abstract":"The indexing of broadcast TV archives is a current problem in multimedia research. As the size of these databases grows continuously, meaningful features are needed to describe and connect their elements efficiently, such as the identification of speaking faces. In this context, this paper focuses on two approaches for unsupervised person discovery. Initial tagging of speaking faces is provided by an OCR-based method, and these tags propagate through a graph model based on audiovisual relations between speaking faces. Two propagation methods are proposed, one based on random walks and the other based on a hierarchical approach. To better evaluate their performances, these methods were compared with two graph clustering baselines. We also study the impact of different modality fusions on the graph-based tag propagation scenario. From a quantitative analysis, we observed that the graph propagation techniques always outperform the baselines. Among all compared strategies, the methods based on hierarchical propagation with late fusion and random walk with score-fusion obtained the highest MAP values. Finally, even though these two methods produce highly equivalent results according to Kappa coefficient, the random walk method performs better according to a paired t-test, and the computing time for the hierarchical propagation is more than 4 times lower than the one for the random walk propagation.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114457048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-automatic Video Assessment System","authors":"Pedro Martins, N. Correia","doi":"10.1145/3095713.3095748","DOIUrl":"https://doi.org/10.1145/3095713.3095748","url":null,"abstract":"This paper describes a system for semi-automatic quality assessment of user generated content (UGC) from large events. It uses image and video processing techniques1 combined with a computational quality model that takes in account aesthetics and how human visual perception and attention mechanisms discriminate visual interest. We describe the approach and show that the developed system allows to sort and filter a large stream of UGC in an efficient and timely manner.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122206396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Approximate Medoids of Temporal Sequences","authors":"W. Bailer","doi":"10.1145/3095713.3095717","DOIUrl":"https://doi.org/10.1145/3095713.3095717","url":null,"abstract":"In order to compactly represent a set of data, its medoid (the element with minimum summed distance to all other elements) is a useful choice. This has applications in clustering, compression and visualisation of data. In multimedia data, the set of data is often sampled as a sequence in time or space, such as a video shot or views of a scene. The exact calculation of the medoid may be costly, especially if the distance function between elements is not trivial. While approximation methods for medoid selection exist, we show in this work that they do not perform well on sequences of images. We thus propose a novel algorithm for efficiently selecting an approximate medoid of a temporal sequence and assess its performance on two large-scale video data sets.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129522915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Suman, F. Hussin, A. Malik, Konstantin Pogorelov, M. Riegler, Shiaw-Hooi Ho, I. Hilmi, K. Goh
{"title":"Detection and Classification of Bleeding Region in WCE Images using Color Feature","authors":"S. Suman, F. Hussin, A. Malik, Konstantin Pogorelov, M. Riegler, Shiaw-Hooi Ho, I. Hilmi, K. Goh","doi":"10.1145/3095713.3095731","DOIUrl":"https://doi.org/10.1145/3095713.3095731","url":null,"abstract":"Wireless capsule endoscopy (WCE) is a modern and efficient technology to diagnose complete gastrointestinal tract (GIT) for various abnormalities. Due to long recording time of WCE, it acquires a huge amount of images, which is very tedious for clinical expertise to inspect each and every frame of a complete video footage. In this paper, an automated color feature based technique of bleeding detection is proposed. In case of bleeding, color is a very important feature for an efficient information extraction. Our algorithm is based on statistical color feature analysis and we use support vector machine (SVM) to classify WCE video frames into bleeding and non-bleeding classes with a high processing speed. An experimental evaluation shows that our method has promising bleeding detection performance with sensitivity and specificity higher than existing approaches.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124539572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Carrara, F. Falchi, R. Caldelli, Giuseppe Amato, Roberta Fumarola, Rudy Becarelli
{"title":"Detecting adversarial example attacks to deep neural networks","authors":"F. Carrara, F. Falchi, R. Caldelli, Giuseppe Amato, Roberta Fumarola, Rudy Becarelli","doi":"10.1145/3095713.3095753","DOIUrl":"https://doi.org/10.1145/3095713.3095753","url":null,"abstract":"Deep learning has recently become the state of the art in many computer vision applications and in image classification in particular. However, recent works have shown that it is quite easy to create adversarial examples, i.e., images intentionally created or modified to cause the deep neural network to make a mistake. They are like optical illusions for machines containing changes unnoticeable to the human eye. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116289143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. M. Obeso, M. García-Vázquez, A. A. Ramírez-Acosta, J. Benois-Pineau
{"title":"Connoisseur: classification of styles of Mexican architectural heritage with deep learning and visual attention prediction","authors":"A. M. Obeso, M. García-Vázquez, A. A. Ramírez-Acosta, J. Benois-Pineau","doi":"10.1145/3095713.3095730","DOIUrl":"https://doi.org/10.1145/3095713.3095730","url":null,"abstract":"The automatic description of multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance for application to this method. Our problem is classification of architectural styles of buildings in digital photographs of Mexican cultural heritage. The selection of relevant content in the scene for training classification models allows them to be more precise in the classification task. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Convolutional Neural Network to identify the architectural style of Mexican buildings. Also, we present an analysis of the behavior of the models trained under the traditional cropped image and the prominence maps. In this sense, we show that the performance of the saliency-based CNNs is better than the traditional training reaching a classification rate of 97% in validation dataset. It is considered that style identification with this technique can make a wide contribution in video description tasks, specifically in the automatic documentation of Mexican cultural heritage.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116367970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Le, H. Bredin, G. Sargent, Miquel India, Paula Lopez-Otero, C. Barras, Camille Guinaudeau, G. Gravier, G. B. Fonseca, I. Freire, Zenilton K. G. Patrocínio, S. Guimarães, Gerard Martí, J. Morros, J. Hernando, Laura Docío Fernández, C. García-Mateo, S. Meignier, J. Odobez
{"title":"Towards large scale multimedia indexing: A case study on person discovery in broadcast news","authors":"N. Le, H. Bredin, G. Sargent, Miquel India, Paula Lopez-Otero, C. Barras, Camille Guinaudeau, G. Gravier, G. B. Fonseca, I. Freire, Zenilton K. G. Patrocínio, S. Guimarães, Gerard Martí, J. Morros, J. Hernando, Laura Docío Fernández, C. García-Mateo, S. Meignier, J. Odobez","doi":"10.1145/3095713.3095732","DOIUrl":"https://doi.org/10.1145/3095713.3095732","url":null,"abstract":"The rapid growth of multimedia databases and the human interest in their peers make indices representing the location and identity of people in audio-visual documents essential for searching archives. Person discovery in the absence of prior identity knowledge requires accurate association of audio-visual cues and detected names. To this end, we present 3 different strategies to approach this problem: clustering-based naming, verification-based naming, and graph-based naming. Each of these strategies utilizes different recent advances in unsupervised face / speech representation, verification, and optimization. To have a better understanding of the approaches, this paper also provides a quantitative and qualitative comparative study of these approaches using the associated corpus of the Person Discovery challenge at MediaEval 2016. From the results of our experiments, we can observe the pros and cons of each approach, thus paving the way for future promising research directions.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129548910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christine Sénac, Thomas Pellegrini, Florian Mouret, J. Pinquier
{"title":"Music Feature Maps with Convolutional Neural Networks for Music Genre Classification","authors":"Christine Sénac, Thomas Pellegrini, Florian Mouret, J. Pinquier","doi":"10.1145/3095713.3095733","DOIUrl":"https://doi.org/10.1145/3095713.3095733","url":null,"abstract":"Nowadays, deep learning is more and more used for Music Genre Classification: particularly Convolutional Neural Networks (CNN) taking as entry a spectrogram considered as an image on which are sought different types of structure. But, facing the criticism relating to the difficulty in understanding the underlying relationships that neural networks learn in presence of a spectrogram, we propose to use, as entries of a CNN, a small set of eight music features chosen along three main music dimensions: dynamics, timbre and tonality. With CNNs trained in such a way that filter dimensions are interpretable in time and frequency, results show that only eight music features are more efficient than 513 frequency bins of a spectrogram and that late score fusion between systems based on both feature types reaches 91% accuracy on the GTZAN database.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130636934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Live Collaborative Social-Media Video Timelines","authors":"Rui Queiros, N. Correia, João Magalhães","doi":"10.1145/3095713.3095750","DOIUrl":"https://doi.org/10.1145/3095713.3095750","url":null,"abstract":"In this paper, we propose a collaborative system to let users share their own videos and interact among themselves to collaboratively do a video coverage of live events. Our intention is to motivate users to make positive contributions to the comprehensiveness of available videos about that event. To achieve this we propose a collaborative video framework, named LiveTime, allowing users to shared information timelines of real-world events. With this solution we offer collaboration features that go beyond existing systems like Youtube and Vimeo. The paper describes the rational and main concepts, the implementation and the results of a preliminary user study.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125381654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}