{"title":"Head and gaze dynamics in visual attention and context learning","authors":"A. Doshi, M. Trivedi","doi":"10.1109/CVPRW.2009.5204215","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204215","url":null,"abstract":"Future intelligent environments and systems may need to interact with humans while simultaneously analyzing events and critical situations. Assistive living, advanced driver assistance systems, and intelligent command-and-control centers are just a few of these cases where human interactions play a critical role in situation analysis. In particular, the behavior or body language of the human subject may be a strong indicator of the context of the situation. In this paper we demonstrate how the interaction of a human observer's head pose and eye gaze behaviors can provide significant insight into the context of the event. Such semantic data derived from human behaviors can be used to help interpret and recognize an ongoing event. We present examples from driving and intelligent meeting rooms to support these conclusions, and demonstrate how to use these techniques to improve contextual learning.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129690818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generative hierarchical models for image analysis","authors":"S. Geman","doi":"10.1109/CVPRW.2009.5204335","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204335","url":null,"abstract":"A probabilistic grammar for the groupings and labeling of parts and objects, when taken together with pose and part-dependent appearance models, constitutes a generative scene model and a Bayesian framework for image analysis. To the extent that the generative model generates features, as opposed to pixel intensities, the \"inverse\" or \"posterior distribution\" on interpretations given images is based on incomplete information; feature vectors are generally insufficient to recover the original intensities. I will argue for fully generative scene models, meaning models that in principle generate actual digital pictures. I will outline an approach to the construction of fully generative models through an extension of context-sensitive grammars and a re-formulation of the popular template models for image fragments. Mostly I will focus on the problem of constructing pixel-level appearance models. I will propose an approach based on image-fragment templates, as introduced by Ullman and others. However, rather than using a correlation between a template and a given image patch as an extracted feature.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132012316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Face recognition by fusion of local and global matching scores using DS theory: An evaluation with uni-classifier and multi-classifier paradigm","authors":"D. Kisku, M. Tistarelli, J. Sing, Phalguni Gupta","doi":"10.1109/CVPRW.2009.5204298","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204298","url":null,"abstract":"Faces are highly deformable objects which may easily change their appearance over time. Not all face areas are subject to the same variability. Therefore decoupling the information from independent areas of the face is of paramount importance to improve the robustness of any face recognition technique. This paper presents a robust face recognition technique based on the extraction and matching of SIFT features related to independent face areas. Both a global and local (as recognition from parts) matching strategy is proposed. The local strategy is based on matching individual salient facial SIFT features as connected to facial landmarks such as the eyes and the mouth. As for the global matching strategy, all SIFT features are combined together to form a single feature. In order to reduce the identification errors, the Dempster-Shafer decision theory is applied to fuse the two matching techniques. The proposed algorithms are evaluated with the ORL and the IITK face databases. The experimental results demonstrate the effectiveness and potential of the proposed face recognition techniques also in the case of partially occluded faces or with missing information.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130767528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling and exploiting the spatio-temporal facial action dependencies for robust spontaneous facial expression recognition","authors":"Yan Tong, Jixu Chen, Q. Ji","doi":"10.1109/CVPRW.2009.5204263","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204263","url":null,"abstract":"Facial action provides various types of messages for human communications. Recognizing spontaneous facial actions, however, is very challenging due to subtle facial deformation, frequent head movements, and ambiguous and uncertain facial motion measurements. As a result, current research in facial action recognition is limited to posed facial actions and often in frontal view.Spontaneous facial action is characterized by rigid head movements and nonrigid facial muscular movements. More importantly, it is the spatiotemporal interactions among the rigid and nonrigid facial motions that produce a meaningful and natural facial display. Recognizing this fact, we introduce a probabilistic facial action model based on a dynamic Bayesian network (DBN) to simultaneously and coherently capture rigid and nonrigid facial motions, their spatiotemporal dependencies, and their image measurements. Advanced machine learning methods are introduced to learn the probabilistic facial action model based on both training data and prior knowledge. Facial action recognition is accomplished through probabilistic inference by systemically integrating measurements official motions with the facial action model. Experiments show that the proposed system yields significant improvements in recognizing spontaneous facial actions.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114296754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geometric Sequence (GS) imaging with Bayesian smoothing for optical and capacitive imaging sensors","authors":"K. Sengupta, F. Porikli","doi":"10.1109/CVPRW.2009.5205205","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5205205","url":null,"abstract":"In this paper, we introduce a novel technique called geometric sequence (GS) imaging, specifically for the purpose of low power and light weight tracking in human computer interface design. The imaging sensor is programmed to capture the scene with a train of packets, where each packet constitutes a few images. The delay or the baseline associated with consecutive image pairs in a packet follows a fixed ratio, as in a geometric sequence. The image pair with shorter baseline or delay captures fast motion, while the image pair with larger baseline or delay captures slow motion. Given an image packet, the motion confidence maps computed from the slow and the fast image pairs are fused into a single map. Next, we use a Bayesian update scheme to compute the motion hypotheses probability map, given the information of prior packets. We estimate the motion from this probability map. The GS imaging system reliably tracks slow movements as well as fast movements, a feature that is important in realizing applications such as a touchpad type system. Compared to continuous imaging with short delay between consecutive pairs, the GS imaging technique enjoys several advantages. The overall power consumption and the CPU load are significantly low. We present results in the domain of optical camera based human computer interface (HCI) applications, as well as for capacitive fingerprint imaging sensor based touch pad systems.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124683947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Rara, S. Elhabian, Asem M. Ali, Mike Miller, T. Starr, A. Farag
{"title":"Face recognition at-a-distance based on sparse-stereo reconstruction","authors":"H. Rara, S. Elhabian, Asem M. Ali, Mike Miller, T. Starr, A. Farag","doi":"10.1109/CVPRW.2009.5204301","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204301","url":null,"abstract":"We describe a framework for face recognition at a distance based on sparse-stereo reconstruction. We develop a 3D acquisition system that consists of two CCD stereo cameras mounted on pan-tilt units with adjustable baseline. We first detect the facial region and extract its landmark points, which are used to initialize an AAM mesh fitting algorithm. The fitted mesh vertices provide point correspondences between the left and right images of a stereo pair; stereo-based reconstruction is then used to infer the 3D information of the mesh vertices. We perform experiments regarding the use of different features extracted from these vertices for face recognition. The cumulative rank curves (CMC), which are generated using the proposed framework, confirms the feasibility of the proposed work for long distance recognition of human faces with respect to the state-of-the-art.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130841135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geo-location inference from image content and user tags","authors":"Andrew C. Gallagher, D. Joshi, Jie Yu, Jiebo Luo","doi":"10.1109/CVPRW.2009.5204168","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204168","url":null,"abstract":"Associating image content with their geographic locations has been increasingly pursued in the computer vision community in recent years. In a recent work, large collections of geotagged images were found to be helpful in estimating geo-locations of query images by simple visual nearest-neighbors search. In this paper, we leverage user tags along with image content to infer the geo-location. Our model builds upon the fact that the visual content and user tags of pictures can provide significant hints about their geo-locations. Using a large collection of over a million geotagged photographs, we build location probability maps of user tags over the entire globe. These maps reflect the picture-taking and tagging behaviors of thousands of users from all over the world, and reveal interesting tag map patterns. Visual content matching is performed using multiple feature descriptors including tiny images, color histograms, GIST features, and bags of textons. The combination of visual content matching and local tag probability maps forms a strong geo-inference engine. Large-scale experiments have shown significant improvements over pure visual content-based geo-location inference.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132684282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Shi, P. Yap, Yong Fan, Jie-Zhi Cheng, L. Wald, G. Gerig, Weili Lin, D. Shen
{"title":"Cortical enhanced tissue segmentation of neonatal brain MR images acquired by a dedicated phased array coil","authors":"F. Shi, P. Yap, Yong Fan, Jie-Zhi Cheng, L. Wald, G. Gerig, Weili Lin, D. Shen","doi":"10.1109/CVPRW.2009.5204348","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204348","url":null,"abstract":"The acquisition of high quality MR images of neonatal brains is largely hampered by their characteristically small head size and low tissue contrast. As a result, subsequent image processing and analysis, especially for brain tissue segmentation, are often hindered. To overcome this problem, a dedicated phased array neonatal head coil is utilized to improve MR image quality by effectively combing images obtained from 8 coil elements without lengthening data acquisition time. In addition, a subject-specific atlas based tissue segmentation algorithm is specifically developed for the delineation of fine structures in the acquired neonatal brain MR images. The proposed tissue segmentation method first enhances the sheet-like cortical gray matter (GM) structures in neonatal images with a Hessian filter for generation of cortical GM prior. Then, the prior is combined with our neonatal population atlas to form a cortical enhanced hybrid atlas, which we refer to as the subject-specific atlas. Various experiments are conducted to compare the proposed method with manual segmentation results, as well as with additional two population atlas based segmentation methods. Results show that the proposed method is capable of segmenting the neonatal brain with the highest accuracy, compared to other two methods.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129251040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lonely but attractive: Sparse color salient points for object retrieval and categorization","authors":"Julian Stöttinger, A. Hanbury, T. Gevers, N. Sebe","doi":"10.1109/CVPRW.2009.5204286","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204286","url":null,"abstract":"Local image descriptors computed in areas around salient points in images are essential for many algorithms in computer vision. Recent work suggests using as many salient points as possible. While sophisticated classifiers have been proposed to cope with the resulting large number of descriptors, processing this large amount of data is computationally costly. In this paper, computational methods are proposed to compute salient points designed to allow a reduction in the number of salient points while maintaining state of the art performance in image retrieval and object recognition applications. To obtain a more sparse description, a color salient point and scale determination framework is proposed operating on color spaces that have useful perceptual and saliency properties. This allows for the necessary discriminative points to be located, allowing a significant reduction in the number of salient points and obtaining an invariant (repeatability) and discriminative (distinctiveness) image description. Experimental results on large image datasets show that the proposed method obtains state of the art results with the number of salient points reduced by half. This reduction in the number of points allows subsequent operations, such as feature extraction and clustering, to run more efficiently. It is shown that the method provides less ambiguous features, a more compact description of visual data, and therefore a faster classification of visual data.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121774186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sergio Escalera, R. M. Martinez, Jordi Vitrià, P. Radeva, M. Anguera
{"title":"Dominance detection in face-to-face conversations","authors":"Sergio Escalera, R. M. Martinez, Jordi Vitrià, P. Radeva, M. Anguera","doi":"10.1109/CVPRW.2009.5204267","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204267","url":null,"abstract":"Dominance is referred to the level of influence a person has in a conversation. Dominance is an important research area in social psychology, but the problem of its automatic estimation is a very recent topic in the contexts of social and wearable computing. In this paper, we focus on dominance detection from visual cues. We estimate the correlation among observers by categorizing the dominant people in a set of face-to-face conversations. Different dominance indicators from gestural communication are defined, manually annotated, and compared to the observers opinion. Moreover, the considered indicators are automatically extracted from video sequences and learnt by using binary classifiers. Results from the three analysis shows a high correlation and allows the categorization of dominant people in public discussion video sequences.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122516710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}