{"title":"Semantic access of frontal face images: the expression-invariant problem","authors":"Aleix M. Mart nez","doi":"10.1109/IVL.2000.853840","DOIUrl":"https://doi.org/10.1109/IVL.2000.853840","url":null,"abstract":"Semantic queries to a database of images are more desirable than low-level feature queries, because they facilitate the user's task. One such approach is object-related image retrieval. In the context of face images, it is of interest to retrieve images based on people's names and facial expressions. However, when images of the database are allowed to appear at different facial expressions, the face recognition approach encounters the expression-invariant problem, i.e. how to robustly identify a person's face for which its learning and testing face images differ in facial expression. This paper presents a new local, probabilistic approach that accounts for this (as well as other previous studied) difficulty.","PeriodicalId":333664,"journal":{"name":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126182714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content based image retrieval through object extraction and querying","authors":"A. H. Kam, T. Ng, N. Kingsbury, W. Fitzgerald","doi":"10.1109/IVL.2000.853846","DOIUrl":"https://doi.org/10.1109/IVL.2000.853846","url":null,"abstract":"We propose a content based image retrieval system based on object extraction through image segmentation. A general and powerful multiscale segmentation algorithm automates the segmentation process, the output of which is assigned novel colour and texture descriptors which are both efficient and effective. Query strategies consisting of a semi-automated and a fully automated mode are developed which are shown to produce good results. We then show the superiority of our approach over the global histogram approach which proves that the ability to access images at the level of objects is essential for CBIR.","PeriodicalId":333664,"journal":{"name":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125591641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hidden Markov model approach to the structure of documentaries","authors":"Tiecheng Liu, J. Kender","doi":"10.1109/IVL.2000.853850","DOIUrl":"https://doi.org/10.1109/IVL.2000.853850","url":null,"abstract":"We have hand-segmented two very long documentaries (100 minutes total) into their component shots. As with other extended videos, shot distribution again appears to be log-normal. Shot lengths are similar to those in dramas, comedies, or action films, but much shorter than those in home videos. The use of fades appears to be an important device to signal transitions between semantic units. We have sought evidence for shot composition rules by means of hidden Markov models (HMMs). We find that camera motion (tilt, pan, zoom) is not significantly governed by rules. However, the bulk of the documentaries take the form of an alternation between commentators and several types of primary supporting material; additionally, the documentaries end with a visual summary. We find that the best approach is one that trains the HMM with labeled subsequences that have approximately equal elapsed time, rather than subsequences with an equal number of shots, or subsequences with shots aligned to some semantic event. This may reflect fundamental temporal limits on human visual attention. We propose that such an underlying structure can suggest more human-sensitive designs for the analysis and graphic display of the contents of extended videos, for summarization, browsing and indexing.","PeriodicalId":333664,"journal":{"name":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134089748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Index trees for efficient deformable shape-based retrieval","authors":"Lifeng Liu, S. Sclaroff","doi":"10.1109/IVL.2000.853845","DOIUrl":"https://doi.org/10.1109/IVL.2000.853845","url":null,"abstract":"An improved method for deformable shape-based image indexing and retrieval is described. A pre-computed index tree is used to improve the speed of our previously reported online model fitting method; simple shape features are used as keys in a pre-generated index tree of model instances. A coarse-to-fine indexing scheme is used at different levels of the tree to further improve speed. Experimental results show that the speedup is significant, while the accuracy of shape-based indexing is maintained. A method for shape population-based retrieval is also described. The method allows query formulation based on the population distributions of shapes in each image. Results of population-based queries for a database of blood cell micrographs are shown.","PeriodicalId":333664,"journal":{"name":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131271572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color indexing using wavelet-based salient points","authors":"N. Sebe, Q. Tian, E. Loupias, M. Lew, T. S. Huang","doi":"10.1109/IVL.2000.853833","DOIUrl":"https://doi.org/10.1109/IVL.2000.853833","url":null,"abstract":"Color is an important attribute for image matching and retrieval. Most of the attention from the research literature has been focused on color indexing techniques based on global color distributions. However, these global distributions have limited discriminating power because they are unable to capture local color information. We present a wavelet-based salient point extraction algorithm. We show that extracting the color information in the locations given by these points provides significantly improved retrieval results as compared to the global color feature approaches.","PeriodicalId":333664,"journal":{"name":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125659756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A system for effortless content annotation to unfold the semantics in videos","authors":"R. Lienhart","doi":"10.1109/IVL.2000.853838","DOIUrl":"https://doi.org/10.1109/IVL.2000.853838","url":null,"abstract":"We propose and investigate a new but simple and natural extension of the way people record video. This extension allows one to unfold the semantics of video clips and thus enables a completely new set of applications for raw video footage. Two microphones are connected to a camcorder: a headworn speech input microphone and an environmental microphone. During recording the cameraman speaks out loud content-descriptive annotations and/or editing commands. Due to the two-microphones setup the sound of annotations and editing commands can be removed from the environmental audio by adaptive filtering enabling people to play back the video as if there had been no annotations. Simultaneously, these annotations are transcribed to ASCII by means of a standard speech recognition engine. The viability of this approach is demonstrated by means of an important application for video libraries: the automatic abstraction of raw video footage.","PeriodicalId":333664,"journal":{"name":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123165007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relevance feedback decision trees in content-based image retrieval","authors":"Sean D. MacArthur, C. Brodley, C. Shyu","doi":"10.1109/IVL.2000.853842","DOIUrl":"https://doi.org/10.1109/IVL.2000.853842","url":null,"abstract":"Significant time and effort has been devoted to finding feature representations of images in databases in order to enable content-based image retrieval (CBIR). Relevance feedback is a mechanism for improving retrieval precision over time by allowing the user to implicitly communicate to the system which of these features are relevant and which are not. We propose a relevance feedback retrieval system that, for each retrieval iteration, learns a decision tree to uncover a common thread between all images marked as relevant. This tree is then used as a model for inferring which of the unseen images the user would not likely desire. We evaluate our approach within the domain of HRCT images of the lung.","PeriodicalId":333664,"journal":{"name":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125229920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian relevance feedback for content-based image retrieval","authors":"Nuno Vasconcelos, Andrew Lippman","doi":"10.1109/IVL.2000.853841","DOIUrl":"https://doi.org/10.1109/IVL.2000.853841","url":null,"abstract":"We present a Bayesian learning algorithm that relies on belief propagation to integrate feedback provided by the user over a retrieval session. Bayesian retrieval leads to a natural criteria for evaluating local image similarity without requiring any image segmentation. This allows the practical implementation of retrieval systems where users can provide image regions, or objects, as queries. Region-based queries are significantly less ambiguous than queries based on entire images leading to significant improvements in retrieval precision. When combined with local similarity, Bayesian belief propagation is a powerful paradigm for user interaction. Experimental results show that significant improvements in the frequency of convergence to the relevant images can be achieved by the inclusion of learning in the retrieval process.","PeriodicalId":333664,"journal":{"name":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122268929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content description for efficient video navigation, browsing and personalization","authors":"P. V. van Beek, I. Sezan, D. Ponceleón, A. Amir","doi":"10.1109/IVL.2000.853837","DOIUrl":"https://doi.org/10.1109/IVL.2000.853837","url":null,"abstract":"Content descriptions are commonly used to index audiovisual content for search and retrieval applications. We present multimedia descriptions that can be used to facilitate rapid navigation, browsing, and efficient access to different views of audiovisual programs according to personal preferences and usage conditions. In particular, we illustrate the use of several description schemes that are currently being developed as part of the MPEG-7 standardization activity. We describe usage scenarios in two application areas: personal video recorder appliances and education and training systems.","PeriodicalId":333664,"journal":{"name":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124365043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Srinivasan, D. Ponceleón, D. Petkovic, M. Viswanathan
{"title":"Query expansion for imperfect speech: applications in distributed learning","authors":"S. Srinivasan, D. Ponceleón, D. Petkovic, M. Viswanathan","doi":"10.1109/IVL.2000.853839","DOIUrl":"https://doi.org/10.1109/IVL.2000.853839","url":null,"abstract":"Advances in speech recognition technology have shown encouraging results for spoken document retrieval where the average precision often approaches 70% of that achieved for perfect text transcriptions. Typical applications of spoken document retrieval pertain to retrieval of stories from archived video/audio assets. In the CueVideo project, our application focus is spoken document retrieval from a video database for just-in-time training/distributed learning. Typical content is not pre-segmented, has no predefined structure, is of varying audio quality, and may not have domain specific data available. For such content, we propose a two level search, namely, a first level search across the entire video collection, and a second level search within a specific video. At both search levels, we perform an experimental evaluation of a combination of new and existing query expansion methods, intended to offset retrieval errors due to misrecognition.","PeriodicalId":333664,"journal":{"name":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132425480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}