{"title":"Uploader models for video concept detection","authors":"B. Mérialdo, U. Niaz","doi":"10.1109/CBMI.2014.6849847","DOIUrl":"https://doi.org/10.1109/CBMI.2014.6849847","url":null,"abstract":"In video indexing, it has been noticed that a simple uploader model was able to improve the MAP of concept detection in the TRECVID Semantic Concept Indexing (SIN) task. In this paper, we explore this idea further by comparing different types of uploader models and different types of score/rank distribution. We evaluate the performance of these combinations on the best SIN 2012 runs, and explore the impact of their parameters. We observe that the improvement is generally lower for the best runs than for the weaker runs. We also observe that tuning the models for each concept independently produces a much more significant improvement.","PeriodicalId":103056,"journal":{"name":"2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131748087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Annotation of still images by multiple visual concepts","authors":"Abdelkader Hamadi, P. Mulhem, G. Quénot","doi":"10.1109/CBMI.2014.6849844","DOIUrl":"https://doi.org/10.1109/CBMI.2014.6849844","url":null,"abstract":"The automatic indexing of images and videos is a highly relevant and important research area in the field of multimedia information retrieval. The difficulty of this task is no longer something to prove. The majority of the efforts of the research community have been focused in the past on the detection of single concepts in images/videos, which is already a hard task. With the evolution of the information retrieval systems, users needs are more abstract, and lead to a larger number of words composing the queries. It is sensible to think about indexing multimedia documents by more than one concept, to help retrieval systems to answer such complex queries. Few studies addressed specifically the problem of detecting multiple concepts (multi-concept) in images and videos, most of them concern the detection of concept pairs. These studies showed that such challenge is even greater than the one of single concept detection. In this work, we address this problematic of mult-concept detection in still images. Two types of approaches are considered : 1) building models per multi-concept and 2) fusion of single concepts detectors. We conducted our evaluation on PASCAL VOC'12 collection regarding the detection of pairs and triplets of concepts. Our results show that the two types of approaches give globally comparable results, but they differ for specific kinds of pairs/triplets.","PeriodicalId":103056,"journal":{"name":"2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115069807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Iakovidou, N. Anagnostopoulos, Athanasios Ch. Kapoutsis, Y. Boutalis, S. Chatzichristofis
{"title":"Searching images with MPEG-7 (& MPEG-7-like) Powered Localized dEscriptors: The SIMPLE answer to effective Content Based Image Retrieval","authors":"C. Iakovidou, N. Anagnostopoulos, Athanasios Ch. Kapoutsis, Y. Boutalis, S. Chatzichristofis","doi":"10.1109/CBMI.2014.6849821","DOIUrl":"https://doi.org/10.1109/CBMI.2014.6849821","url":null,"abstract":"In this paper we propose and evaluate a new technique that localizes the description ability of the well established MPEG-7 and MPEG-7-like global descriptors. We employ the SURF detector to define salient image patches of blob-like textures and use the MPEG-7 Scalable Color (SC), Color Layout (CL) and Edge Histogram (EH) descriptors and the global MPEG-7-like Color and Edge Directivity Descriptor (CEDD), to produce the final local features' vectors. In order to test the new descriptors in the most straightforward fashion, we use the Bag-Of-Visual-Words framework for indexing and retrieval. The experimental results conducted on two different benchmark databases with varying codebook sizes, revealed an astonishing boost in the retrieval performance of the proposed descriptors compared both to their own performance (in their original form) and to other state-of-the-art methods of local and global descriptors. Open-source implementation of the proposed descriptors is available in c#, Java and MATLAB.","PeriodicalId":103056,"journal":{"name":"2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123640862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karina Ruby Perez-Daniel, M. Nakano-Miyatake, J. Benois-Pineau, S. Maabout, G. Sargent
{"title":"Scalable video summarization of cultural video documents in cross-media space based on data cube approach","authors":"Karina Ruby Perez-Daniel, M. Nakano-Miyatake, J. Benois-Pineau, S. Maabout, G. Sargent","doi":"10.1109/CBMI.2014.6849824","DOIUrl":"https://doi.org/10.1109/CBMI.2014.6849824","url":null,"abstract":"Video summarization has been a core problem to manage the growing amount of content in multimedia databases. An efficient video summary should display an overview of the video content and most of existing approaches fulfil this goal. However the information does not allow user to get all details of interest selectively and progressively. This paper proposes a scalable video summarization approach which provides multiple views and levels of details. Our method relies on the usage of cross media space and consensus clustering method. A video document is modelled as a data cube where the level of details is refined over nonconsensual features of the space. The method is designed for weakly structured content such as cultural documentaries and was tested on the INA corpus of cultural archives.","PeriodicalId":103056,"journal":{"name":"2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125868252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A robust audio fingerprinting method for content-based copy detection","authors":"Chahid Ouali, P. Dumouchel, Vishwa Gupta","doi":"10.1109/CBMI.2014.6849814","DOIUrl":"https://doi.org/10.1109/CBMI.2014.6849814","url":null,"abstract":"This paper presents a novel audio fingerprinting method that is highly robust to a variety of audio distortions. It is based on unconventional audio fingerprints generation scheme. The robustness is achieved by generating different versions of the spectrogram matrix of the audio signal by using a threshold based on the average of the spectral values to prune this matrix. We transform each version of this pruned spectrogram matrix into a 2-D binary image. Multiple 2-D images suppress noise to a varying degree. This varying degree of noise suppression improves likelihood of one of the images matching a reference image. To speed up matching, we convert each image into an n-dimensional vector, and perform a nearest neighbor search based on this n-dimensional vector. We test this method on TRECVID 2010 content-based copy detection evaluation dataset. Experimental results show the effectiveness of such fingerprints even when the audio is distorted. We compare the proposed method to a state-of-the-art audio copy detection system. Results of this comparison show that our method achieves an improvement of 22% in localization accuracy, and lowers minimal normalized detection cost rate (min NDCR) by half for audio transformations T1 and T2.","PeriodicalId":103056,"journal":{"name":"2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129481398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online multimodal matrix factorization for human action video indexing","authors":"F. Páez, Jorge A. Vanegas, F. González","doi":"10.1109/CBMI.2014.6849823","DOIUrl":"https://doi.org/10.1109/CBMI.2014.6849823","url":null,"abstract":"This paper addresses the problem of searching for videos containing instances of specific human actions. The proposed strategy builds a multimodal latent space representation where both visual content and annotations are simultaneously mapped. The hypothesis behind the method is that such a latent space yields better results when built from multiple data modalities. The semantic embedding is learned using matrix factorization through stochastic gradient descent, which makes it suitable to deal with large-scale collections. The method is evaluated on a large-scale human action video dataset with three modalities corresponding to action labels, action attributes and visual features. The evaluation is based on a query-by-example strategy, where a sample video is used as input to the system. A retrieved video is considered relevant if it contains an instance of the same human action present in the query. Experimental results show that the learned multimodal latent semantic representation produces improved performance when compared with an exclusively visual representation.","PeriodicalId":103056,"journal":{"name":"2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121087815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ultrasound image processing based on machine learning for the fully automatic evaluation of the Carotid Intima-Media Thickness","authors":"R. Menchón-Lara, J. Sancho-Gómez","doi":"10.1109/CBMI.2014.6849839","DOIUrl":"https://doi.org/10.1109/CBMI.2014.6849839","url":null,"abstract":"Atherosclerosis is responsible for a large proportion of cardiovascular diseases (CVD), which are the leading cause of death in the world. The atherosclerotic process, mainly affecting the medium- and large-size arteries, is a degenerative condition that causes thickening and the reduction of elasticity in the blood vessels. The Intima-Media Thickness (IMT) of the Common Carotid Artery (CCA) is a reliable early indicator of atherosclerosis. Usually, it is manually measured by marking pairs of points on a B-mode ultrasound scan image of the CCA. This paper proposes an automatic image segmentation procedure for the measurement of the IMT, avoiding the user dependence and the inter-rater variability. In particular, Radial Basis Function (RBF) Networks are designed and trained by means of the Optimally Pruned-Extreme Learning Machine (OP-ELM) algorithm to classify pixels from a given ultrasound image, allowing the extraction of IMT boundaries. The suggested approach has been validated on a set of 25 ultrasound images by comparing the automatic segmentations with manual tracings.","PeriodicalId":103056,"journal":{"name":"2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127461359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inverse square rank fusion for multimodal search","authors":"André Mourão, Flávio Martins, João Magalhães","doi":"10.1109/CBMI.2014.6849825","DOIUrl":"https://doi.org/10.1109/CBMI.2014.6849825","url":null,"abstract":"Rank fusion is the task of combining multiple ranked document lists (ranks) into a single ranked list. It is a late fusion approach designed to improve the rankings produced by individual systems. Rank fusion techniques have been applied throughout multiple domains: e.g. combining results from multiple retrieval functions, or multimodal search where several feature spaces are common. In this paper, we present the Inverse Square Rank fusion method family, a set of novel fully unsupervised rank fusion methods based on quadratic decay and on logarithmic document frequency normalization. Our experiments created with standard Information Retrieval datasets (image and text fusion) and image datasets (image features fusion), show that ISR outperforms existing rank fusion algorithms. Thus, the proposed technique has comparable or better performance than existing state-of-the-art approaches, while maintaining a low computational complexity and avoiding the need for document scores or training data.","PeriodicalId":103056,"journal":{"name":"2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122794753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bag of morphological words for content-based geographical retrieval","authors":"E. Aptoula","doi":"10.1109/CBMI.2014.6849837","DOIUrl":"https://doi.org/10.1109/CBMI.2014.6849837","url":null,"abstract":"Placed in the context of geographical content-based image retrieval, in this paper we explore the description potential of morphological texture descriptors when combined with the popular bag-of-visual-words paradigm. In particular, we adapt existing global morphological texture descriptors, so that they are computed within local sub-windows and then form a vocabulary of “visual morphological words” through clustering. The resulting image features, are thus visual word histograms and are evaluated using the UC Merced Land Use-Land Cover dataset. Moreover, the local approach under study is compared against alternative global and local descriptors across a variety of settings. Despite being one of the initial attempts at localized morphological content description, the retrieval scores indicate that vocabulary based morphological content description possesses a significant discriminatory potential.","PeriodicalId":103056,"journal":{"name":"2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129569178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian X. Ries, Fabian Richter, Stefan Romberg, R. Lienhart
{"title":"Automatic object annotation from weakly labeled data with latent structured SVM","authors":"Christian X. Ries, Fabian Richter, Stefan Romberg, R. Lienhart","doi":"10.1109/CBMI.2014.6849838","DOIUrl":"https://doi.org/10.1109/CBMI.2014.6849838","url":null,"abstract":"In this paper we present an approach to automatic object annotation. We are given a set of positive images which all contain a certain object and our goal is to automatically determine the position of said object in each image. Our approach first applies a heuristic to identify initial bounding boxes based on color and gradient features. This heuristic is based on image and feature statistics. Then, the initial boxes are refined by a latent structured SVM training algorithm which is based on the CCCP training algorithm. We show that our approach outperforms previous work on multiple datasets.","PeriodicalId":103056,"journal":{"name":"2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114408563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}