{"title":"Tonal-based retrieval of Arabic and middle-east music by automatic makam description","authors":"Leonidas Ioannidis, E. Gómez, P. Herrera","doi":"10.1109/CBMI.2011.5972516","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972516","url":null,"abstract":"The automatic description of music from traditions that do not follow the Western notation and theory needs specifically designed tools. We investigate here the makams, which are scales in the modal music of Arabic and Middle East regions. We evaluate two approaches for classifying musical pieces from the ‘makam world’, according to their scale, by using chroma features extracted from polyphonic music signals. The first method compares the extracted features with a set of makam templates, while the second one uses trained classifiers. Both approaches provided good results (F-measure=0.69 and 0.73 respectively) on a collection of 302 pieces from 9 makam families. Furthermore, error analyses showed that certain confusions were musically coherent and that these techniques could complement each other in this particular context.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114453620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient method for the unsupervised discovery of signalling motifs in large audio streams","authors":"Armando Muscariello, G. Gravier, F. Bimbot","doi":"10.1109/CBMI.2011.5972536","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972536","url":null,"abstract":"Providing effective tools to navigate and access through long audio archives, or monitor and classify broadcast streams, proves to be an extremely challenging task. Main issues originate from the varied nature of patterns of interest in a composite audio environment, the massive size of such databases, and the capability of performing when prior knowledge on audio content is scarce or absent. This paper proposes a computational architecture aimed at discovering occurrences of repeating patterns in audio streams by means of unsupervised learning. The targeted repetitions (or motifs) are called signalling, by analogy with a biological nomenclature, as referring to a broad class of audio patterns (as jingles, songs, advertisements, etc…) frequently occurring in broadcast audio. We adapt a system originally developed for word discovery applications, and demonstrate its effectiveness in a song discovery scenario. The adaption consists in speeding up critical parts of the computations, mostly based on audio feature coarsening, to deal with the large occurrence period of repeating songs in radio streams.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114208782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic extraction of pornographic contents using radon transform based audio features","authors":"Myungjong Kim, Hoirin Kim","doi":"10.1109/CBMI.2011.5972546","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972546","url":null,"abstract":"This paper focuses on the problem of classifying pornographic sounds, such as sexual scream or moan, to detect and block the objectionable multimedia contents. To represent the large temporal variations of pornographic sounds, we propose a novel feature extraction method based on Radon transform. Radon transform provides a way to extract the global trend of orientations in a 2-D region and therefore it is applicable to the time-frequency spectrograms in the long-range segment to capture the large temporal variations of sexual sounds. Radon feature is extracted using histograms and flux of Radon coefficients. We adopt Gaussian mixture model to statistically represent the pornographic and non-pornographic sounds, and the test sounds are classified by using likelihood ratio test. Evaluations on several hundred pornographic and non-pornographic sound clips indicate that the proposed features can achieve satisfactory results that this approach could be used as an alternative to the image-based methods.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128399065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On-line characters identification in movies","authors":"Bertrand Delezoide, D. Nouri, S. Hamlaoui","doi":"10.1109/CBMI.2011.5972540","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972540","url":null,"abstract":"Characters identification in video consists in assigning name labels to persons present in a video. We explore here the online labeling of faces in movie. Previous work such as [12, 13] demonstrated promising results on learning and classifying characters using a manually annotated learning-corpus. Some practical issues appear when applying this method on large scale movie database in permanent evolution, as the number of characters to recognize is important and continuously grows. In this paper we build on the first method extending the coverage greatly by learning appearance models of new cast members online for each new movie. In addition, we make the following contributions: (1) we propose to apply Active Appearance Models (AAM) tracking method in order to track local facial features over time and orientation changes, (2) we evaluate important parameters of the feature extraction such as position and size of local features. We report results on the movie I am a legend demonstrating the relevance of our on-line approach of the problem.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"239 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134268787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query log simulation for long-term learning in image retrieval","authors":"Donn Morrison, S. Marchand-Maillet, E. Bruno","doi":"10.1109/CBMI.2011.5972520","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972520","url":null,"abstract":"In this paper we formalise a query simulation framework for the evaluation of long-term learning systems for image retrieval. Long-term learning relies on historical queries and associated relevance judgements, usually stored in query logs, in order to improve search results presented to users of the retrieval system. Evaluation of long-term learning methods requires access to query logs, preferably in large quantity. However, real-world query logs are notoriously difficult to acquire due to legitimate efforts of safeguarding user privacy. Query log simulation provides a useful means of evaluating long-term learning approaches without the need for real-world data. We introduce a query log simulator that is based on a user model of long-term learning that explains the observed relevance judgements contained in query logs. We validate simulated queries against a real-world query log of an image retrieval system and demonstrate that for evaluation purposes, the simulator is accurate on a global level.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121941801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christos Zigkolis, S. Papadopoulos, Y. Kompatsiaris, A. Vakali
{"title":"Detecting the long-tail of Points of Interest in tagged photo collections","authors":"Christos Zigkolis, S. Papadopoulos, Y. Kompatsiaris, A. Vakali","doi":"10.1109/CBMI.2011.5972551","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972551","url":null,"abstract":"The paper tackles the problem of matching the photos of a tagged photo collection to a list of “long-tail” Points Of Interest (PoIs), that is PoIs that are not very popular and thus not well represented in the photo collection. Despite the significance of improving “long-tail” PoI photo retrieval for travel applications, most landmark detection methods to date have been tested on very popular landmarks. In this paper, we conduct a thorough empirical analysis comparing four baseline matching methods that rely on photo metadata, three variants of an approach that uses cluster analysis in order to discover PoI-related photo clusters, and a real-world retrieval mechanism (Flickr search) on a set of less popular PoIs. A user-based evaluation of the aforementioned methods is conducted on a Flickr photo collection of over 100, 000 photos from 10 well-known touristic destinations in Greece. A set of 104 “long-tail” PoIs is collected for these destinations from Wikipedia, Wikimapia and OpenStreetMap. The results demonstrate that two of the baseline methods outperform Flickr search in terms of precision and F-measure, whereas two of the cluster-based methods outperform it in terms of recall and PoI coverage. We consider the results of this study valuable for enhancing the indexing of pictorial content in social media sites.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127948522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Bertini, A. Bimbo, Andrea Ferracani, Daniele Pezzatini
{"title":"Interactive video search and browsing systems","authors":"M. Bertini, A. Bimbo, Andrea Ferracani, Daniele Pezzatini","doi":"10.1109/CBMI.2011.5972543","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972543","url":null,"abstract":"In this paper we present two interactive systems for video search and browsing; one is a web application based on the Rich Internet Application paradigm, designed to obtain the levels of responsiveness and interactivity typical of a desktop application, while the other exploits multi-touch devices to implement a multi-user collaborative application. Both systems use the same ontology-based video search engine, that is capable of expanding user queries through ontology reasoning, and let users to search for specific video segments that contain a semantic concept or to browse the content of video collections, when it's too difficult to express a specific query.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129055198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Medical image modality classification and retrieval","authors":"G. Csurka, S. Clinchant, Guillaume Jacquet","doi":"10.1109/CBMI.2011.5972544","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972544","url":null,"abstract":"The aim of this paper is to explore different medical image modality and retrieval strategies. First, we analyze how current state-of-the art image representations (bags of visual words and Fisher Vectors) perform when we use them for medical modality classification. Then we integrated these representations in a content based image retrieval system and tested on a medical image retrieval task. Finally, in both cases, we explored how the performance can be improved if we combine visual with textual information. To show the performance of different systems we compared our approaches to the systems participated at the Medical Task of the latest ImageClef Challenge [16].","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115593708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Bogdanov, Martín Haro, Ferdinand Fuhrmann, Anna Xambó, E. Gómez, P. Herrera
{"title":"A content-based system for music recommendation and visualization of user preferences working on semantic notions","authors":"D. Bogdanov, Martín Haro, Ferdinand Fuhrmann, Anna Xambó, E. Gómez, P. Herrera","doi":"10.1109/CBMI.2011.5972554","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972554","url":null,"abstract":"The amount of digital music has grown unprecedentedly during the last years and requires the development of effective methods for search and retrieval. In particular, content-based preference elicitation for music recommendation is a challenging problem that is effectively addressed in this paper. We present a system which automatically generates recommendations and visualizes a user's musical preferences, given her/his accounts on popular online music services. Using these services, the system retrieves a set of tracks preferred by a user, and further computes a semantic description of musical preferences based on raw audio information. For the audio analysis we used the capabilities of the Canoris API. Thereafter, the system generates music recommendations, using a semantic music similarity measure, and a user's preference visualization, mapping semantic descriptors to visual elements.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124098509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johann Poignant, F. Thollard, G. Quénot, L. Besacier
{"title":"Text detection and recognition for person identification in videos","authors":"Johann Poignant, F. Thollard, G. Quénot, L. Besacier","doi":"10.1109/CBMI.2011.5972553","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972553","url":null,"abstract":"This article presents a demo of person search in audiovisual broadcast using only the text available in a video and in resources external to the video. We also present the different steps used to recognize characters in video for multi-modal person recognition systems. Text detection is realized using the text features (texture, color, contrast, geometry, temporal information). The text recognition itself is performed by the Google Tesseract free software. The method was successfully evaluated on a broadcast news corpus that contains 59 videos from the France 2 French TV channel.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115749088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}