{"title":"Efficient indexing structures for fast media search and browsing","authors":"Marco Teixeira, João Magalhães","doi":"10.1109/CBMI.2011.5972533","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972533","url":null,"abstract":"Fast media search and browsing is today a growing need to face the challenge of managing large collections of personal media. Traditional databases, (e.g. MySQL), or text databases (e.g. Lucene), do not address an important aspect of multimedia data: the high-dimensionality of data. In this paper, we describe the implementation and evaluation of high-dimensional data indexing structures for fast search and browsing. Index structures for high-dimensional data have been researched in the literature and several proposals exist in the literature. We compare five popular index structures on a large-scale image retrieval scenario with different visual features of varying dimensionality. Both indexing and search aspects are evaluated: indexing time, search time and the trade-off between precision and performance.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128692043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spoken WordCloud: Clustering recurrent patterns in speech","authors":"Rémi Flamary, Xavier Anguera Miró, Nuria Oliver","doi":"10.1109/CBMI.2011.5972534","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972534","url":null,"abstract":"The automatic summarization of speech recordings is typically carried out as a two step process: the speech is first decoded using an automatic speech recognition system and the resulting text transcripts are processed to create a summary. However, this approach might not be suitable in adverse acoustic conditions or when applied to languages with limited training resources. In order to address these limitations, in this paper we propose an automatic speech summarization method that is based on the automatic discovery of recurrent patterns in the speech: recurrent acoustic patterns are first extracted from the audio and then are clustered and ranked according to the number of repetitions, creating an approximate acoustic summary of what was spoken. This approach allows us to build what we call a “Spoken WordCloud” termed after similarity with text-based word-clouds. We present an algorithm that achieves a cluster purity of up to 90% and an inverse purity of 71% in preliminary experiments using a small dataset of connected spoken words.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124435354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised anchorpersons differentiation in news video","authors":"M. Broilo, A. Basso, F. D. Natale","doi":"10.1109/CBMI.2011.5972531","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972531","url":null,"abstract":"The automatic extraction of video structure from content is of key importance to enable a variety of multimedia services that span from search and retrieval to content manipulation. An unsupervised independent unimodal clustering method for anchorpersons detection and differentiation in newscasts is presented in this paper. The algorithm exploits audio, frame and face information to identify major cast in the content. These three components are first processed independently during the cluster analysis and then jointly in a compositional mining phase. A differentiation of the role played by the people in the video has been implemented exploiting the temporal characteristics of the detected anchorpersons. Experiments show significant precision/recall results thus opening further research directions in video analysis, particularly when the content is highly structured as in TV newscasts.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125644843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient video summarization and retrieval tools","authors":"Víctor Valdés, J. Sanchez","doi":"10.1109/CBMI.2011.5972518","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972518","url":null,"abstract":"In this paper we describe the video browsing and retrieval techniques included within the ASSETS project system, focused on providing enhanced access to video repositories. The proposed mechanisms aims to provide efficient and reusable techniques for browsing and retrieval, trying to minimize the computational and storage cost of the approach while offering novel functionalities such as personalized/real-time video summarization. The system is under design and development within the ASSETS project that deals with advanced tools for accessing to cultural content.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"50 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114319425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using LIDO to handle 3D cultural heritage documentation data provenance","authors":"D. Pitzalis, F. Niccolucci, M. Cord","doi":"10.1109/CBMI.2011.5972517","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972517","url":null,"abstract":"It is important for Digital Libraries (DL) to be flexible in exposing their content. Typically a DL provides a search/browse interface which allows resources to be found and a service to make the data available for harvesting from/to other DLs. This kind of communication is possible because the structures of different DLs are expressed following formal specifications. In particular in Cultural Heritage, where we need to describe an extremely heterogeneous environment, some metadata standards are emerging and mappings are proposed to allow metadata exchange and enrichment. CIDOC-CRM is an ontology designed to mediate contents in the area of tangible cultural heritage and was published as ISO 21127 : 2006 standard. Lately an extension of CIDOC-CRM, known as CRMdig, enables to document information about data provenance and digital surrogates in a very precise way. Another metadata schema suitable for handling museum-related data is LIDO. In this paper we propose a case study where we show how CIDOC-CRMdig and LIDO handle the digital information of an object and specially the data provenance.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125480126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised scene detection in Olympic video using multi-modal chains","authors":"Gert-Jan Poulisse, Marie-Francine Moens","doi":"10.1109/CBMI.2011.5972529","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972529","url":null,"abstract":"This paper presents a novel unsupervised method for identifying the semantic structure in long semi-structured video streams. We identify ‘chains’, local clusters of repeated features from both the video stream and audio transcripts. Each chain serves as an indicator that the temporal interval it demarcates is part of the same semantic event. By layering all the chains over each other, dense regions emerge from the overlapping chains, from which we can identify the semantic structure of the video. We analyze two clustering strategies that accomplish this task.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122682447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive social, spatial and temporal querying for multimedia retrieval","authors":"G. C. D. Silva, K. Aizawa, Yuki Arase, Xing Xie","doi":"10.1109/CBMI.2011.5972512","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972512","url":null,"abstract":"We propose a scheme for faster and more effective retrieval of temporal, spatial and social multimedia from large collections. We define interactive multimedia queries that allow simultaneous query refinement on multiple search dimensions. User interaction techniques based on line and iconic sketches allow specifying queries based on the above definition. We prototype a multi-user travel media network and implement the proposed user interaction techniques for retrieving locomotion patterns of the users. The proposed queries facilitate easy input and refinement of queries, and efficient retrieval.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114809969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Binary SIFT: Fast image retrieval using binary quantized SIFT features","authors":"K. A. Peker","doi":"10.1109/CBMI.2011.5972548","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972548","url":null,"abstract":"SIFT features are widely used in content based image retrieval. Typically, a few thousand keypoints are extracted from each image. Image matching involves distance computations across all pairs of SIFT feature vectors from both images, which is quite costly. We show that SIFT features perform surprisingly well even after quantizing each component to binary, when the medians are used as the quantization thresholds. Quantized features preserve both distinctiveness and matching properties. Almost all of the features in our 5.4 million feature test set map to distinct binary patterns after quantization. Furthermore, number of matches between images using both the original and the binary quantized SIFT features are quite similar. We investigate the distribution of SIFT features and observe that the space of 128-D binary vectors has sufficient capacity for the current performance of SIFT features. We use component median values as quantization thresholds and show through vector-to-vector distance comparisons and image-to-image matches that the resulting binary vectors perform comparable to original SIFT vectors. We also discuss computational and storage gains. Binary vector distance computation reduces to bit-wise operations. Square operation is eliminated. Fast and efficient indexing techniques such as the signatures used for chemical databases can also be considered.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131406192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Chatzilari, S. Nikolopoulos, S. Papadopoulos, Christos Zigkolis, Y. Kompatsiaris
{"title":"Semi-supervised object recognition using flickr images","authors":"E. Chatzilari, S. Nikolopoulos, S. Papadopoulos, Christos Zigkolis, Y. Kompatsiaris","doi":"10.1109/CBMI.2011.5972550","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972550","url":null,"abstract":"In this work we present an algorithm for extracting region level annotations from flickr images using a small set of manually labelled regions to guide the selection process. More specifically, we construct a set of flickr images that focuses on a certain concept and apply a novel graph based clustering algorithm on their regions. Then, we select the cluster or clusters that correspond to the examined concept guided by the manually labelled data. Experimental results show that although the obtained regions are of lower quality compared to the manually labelled regions, the gain in effort compensates for the loss in performance.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131456377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ImmEx: IMMersive text documents exploration system","authors":"Mario Cataldi, Luigi Di Caro, C. Schifanella","doi":"10.1109/CBMI.2011.5972511","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972511","url":null,"abstract":"Common search engines, especially web-based, rely on standard keyword-based queries and matching algorithms using word frequencies, topics recentness, documents authority and/or thesauri. However, even if those systems present efficient retrieval algorithms, they are not able to lead the user into an intuitive exploration of large data collections because of their cumbersome presentations of the results (e.g. large lists of entries). Moreover, these methods do not provide any mechanism to retrieve other relevant information associated to those contents and, even if query refinement methods are proposed, it is really hard to express it because of the user's inexperience and common lack of familiarity with terminology. Therefore, we propose ImmEx, a novel visual navigational system for an immersive exploration of text documents that overcomes these problems by leveraging the intuitiveness of semantically-related images, retrieved in real-time from popular image sharing services. ImmEx lets independently explore large text collection through a novel approach that exploits the directness of the images and their user-generated metadata. We finally analyze the efficiency and usability of the proposed system by providing case and user studies.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116692795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}