{"title":"Modelling faces dynamically across views and over time","authors":"Yongmin Li, S. Gong, H. Liddell","doi":"10.1109/ICCV.2001.937565","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937565","url":null,"abstract":"A comprehensive novel multi-view dynamic face model is presented in this paper to address two challenging problems in face recognition and facial analysis: modelling faces with large pose variation and modelling faces dynamically in video sequences. The model consists of a sparse 3D shape model learnt from 2D images, a shape-and-pose-free texture model, and an affine geometrical model. Model fitting is performed by optimising (1) a global fitting criterion on the overall face appearance while it changes across views and over time, (2) a local fitting criterion on a set of landmarks, and (3) a temporal fitting criterion between successive frames in a video sequence. By temporally estimating the model parameters over a sequence input, the identity and geometrical information of a face is extracted separately. The former is crucial to face recognition and facial analysis. The latter is used to aid tracking and aligning faces. We demonstrate the results of successfully applying this model on faces with large variation of pose and expression over time.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132251235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical context priming for object detection","authors":"A. Torralba, P. Sinha","doi":"10.1109/ICCV.2001.937604","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937604","url":null,"abstract":"There is general consensus that context can be a rich source of information about an object's identity, location and scale. However the issue of how to formalize centextual influences is still largely open. Here we introduce a simple probabilistic framework for modeling the relationship between context and object properties. We represent global context information in terms of the spatial layout of spectral components. The resulting scheme serves as an effective procedure for context driven focus of attention and scale-selection on real-world scenes. Based on a simple holistic analysis of an image, the scheme is able to accurately predict object locations and sizes.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128275710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for segmentation of talk and game shows","authors":"O. Javed, Z. Rasheed, M. Shah","doi":"10.1109/ICCV.2001.937671","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937671","url":null,"abstract":"In this paper, we present a method to remove commercials from talk and game show videos and to segment these videos into host and guest shots. In our approach, we mainly rely on information contained in shot transitions, rather than analyzing the scene content of individual frames. We utilize the inherent differences in scene structure of commercials and talk shows to differentiate between them. Similarly, we make use of the well-defined structure of talk shows, which can be exploited to classify shots as host or guest shots. The entire show is first segmented into camera shots based on color histogram. Then, we construct a data-structure (shot connectivity graph) which links similar shots over time. Analysis of the shot connectivity graph helps us to automatically separate commercials from program segments. This is done by first detecting stories, and then assigning a weight to each story based on its likelihood of being a commercial. Further analysis on stories is done to distinguish shots of the hosts from shots of the guests. We have tested our approach on several full-length shows (including commercials) and have achieved video segmentation with high accuracy. The whole scheme is fast and works even on low quality video (160/spl times/120 pixel images at 5 Hz).","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134552263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse PCA. Extracting multi-scale structure from data","authors":"C. Chennubhotla, A. Jepson","doi":"10.1109/ICCV.2001.937579","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937579","url":null,"abstract":"Sparse Principal Component Analysis (S-PCA) is a novel framework for learning a linear, orthonormal basis representation for structure intrinsic to an ensemble of images. S-PCA is based on the discovery that natural images exhibit structure in a low-dimensional subspace in a sparse, scale-dependent form. The S-PCA basis optimizes an objective function which trades off correlations among output coefficients for sparsity in the description of basis vector elements. This objective function is minimized by a simple, robust and highly scalable adaptation algorithm, consisting of successive planar rotations of pairs of basis vectors. The formulation of S-PCA is novel in that multi-scale representations emerge for a variety of ensembles including face images, images from outdoor scenes and a database of optical flow vectors representing a motion class.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133592237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Li, Jie Yan, Xinwen Hou, ZeYu Li, HongJiang Zhang
{"title":"Learning low dimensional invariant signature of 3-D object under varying view and illumination from 2-D appearances","authors":"S. Li, Jie Yan, Xinwen Hou, ZeYu Li, HongJiang Zhang","doi":"10.1109/ICCV.2001.937578","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937578","url":null,"abstract":"In this paper, we propose an invariant signature representation for appearances of 3-D object under varying view and illumination, and a method for learning the signature from multi-view appearance examples. The signature, a nonlinear feature, provides a good basis for 3-D object detection and pose estimation due to its following properties. (I) Its location in the signature feature space is a simple function of the view and is insensitive or invariant to illumination. (2) It changes continuously as the view changes, so that the object appearances at all possible views should constitute a known simple curve segment (manifold) in the feature space. (3) The coordinates of rite object appearances in the feature space are correlated in a known way according to a predefined function of the view. The first two properties provide a basis for object detection and the third for view (pose) estimation. To compute the signature representation from input, we present a nonlinear regression method for learning a nonlinear mapping from the input (e.g. image) space to the feature space. The ideas of the signature representation and the learning method are illustrated with experimental results for the object of human face. It is shown that the face object can be effectively, modeled compactly in a 10-D nonlinear feature space. The 10-D signature presents excellent insensitivity to changes in illumination for any view. The correlation of the signature coordinates is well determined by the predefined parametric function. Applications of the proposed method in face detection and pose estimation are demonstrated.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134323346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengyou Zhang, Zicheng Liu, Dennis Adler, Michael F. Cohen, E. Hanson, Ying Shan
{"title":"Cloning your own face with a desktop camera","authors":"Zhengyou Zhang, Zicheng Liu, Dennis Adler, Michael F. Cohen, E. Hanson, Ying Shan","doi":"10.1109/ICCV.2001.937707","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937707","url":null,"abstract":"We have developed an easy and cost-effective system that constructs textured 3D animated face models from videos with minimal user interaction. Our system first takes, with an ordinary video camera, images of a face of a person sitting in front of the camera turning the head from one side to the other. After five manual clicks on two images to tell the system where the eye corners, nose top and mouth corners are, the system automatically generates a realistic looking 3D human head model and the constructed model can be animated immediately (different poses, facial expressions and talking). A user, with a PC and a video camera, can use our system to generate hisher face model in a few minutes. The face model can then be imported in hisher favorite game, and the user sees themselves and their friends take part in the game they are playing. We will demonstrate the system on a laptop computer live at the conference, and participants can try it to model their own faces.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116190795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D object tracking using shape-encoded particle propagation","authors":"H. Moon, R. Chellappa, A. Rosenfeld","doi":"10.1109/ICCV.2001.937641","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937641","url":null,"abstract":"We present a comprehensive treatment of 3D object tracking by posing it as a nonlinear state estimation problem. The measurements are derived using the outputs of shape-encoded filters. The nonlinear state estimation is performed by solving the Zakai equation, and we use the branching particle propagation method for computing the solution. The unnormalized conditional density for the solution to the Zakai equation is realized by the weight of the particle. We first sample a set of particles approximating the initial distribution of the state vector conditioned on the observations, where each particle encodes the set of geometric parameters of the object. The weight of the particle represents geometric and temporal fit, which is computed bottom-up from the raw image using a shape-encoded filter. The particles branch so that the mean number of offspring is proportional to the weight. Time update is handled by employing a second-order motion model, combined with local stochastic search to minimize the prediction error. The prediction adjustment suggested by system identification theory is empirically verified to contribute to global stability. The amount of diffusion is effectively adjusted using a Kalman updating of the covariance matrix. WE have successfully applied this method to human head tracking, where we estimate head motion and compute structure using simple head and facial feature models.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122296461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The KGBR viewpoint-lighting ambiguity and its resolution by generic constraints","authors":"A. Yuille, J. Coughlan, S. Konishi","doi":"10.1109/ICCV.2001.937650","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937650","url":null,"abstract":"We describe a novel viewpoint-lighting ambiguity which we call the KGBR. This ambiguity assumes orthographic projecting or an affine camera, and uses Lambertian reflectance functions including case/attached shadows and multiple light sources. A KGBR transform alters the geometry (by a three-dimensional affine transformation) and albedo properties of objects. If two objects are related by a KGBR transform then for any viewpoint and lighting of the first object there exists a corresponding viewpoint and lighting of the second object so that the images are identical up to an affine transformation. The Generalized Bas Relief (GBR) ambiguity is obtained as a special case of the KGBR. We describe generic viewpoint and lighting assumptions and show that either, or both, resolve this ambiguity by biasing towards objects with planar geometry.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125018143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Euclidean reconstruction and auto-calibration from continuous motion","authors":"Fredrik Kahl, A. Heyden","doi":"10.1109/ICCV.2001.937677","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937677","url":null,"abstract":"This paper deals with the problem of incorporating natural regularity conditions on the motion in an MAP estimator for structure and motion recovery from uncalibrated image sequences. The purpose of incorporating these constraints is to increase performance and robustness. Auto-calibration and structure and motion algorithms are known to have problems with (i) the frequently occurring critical camera motions, (ii) local minima in the non-linear optimization and (iii) the high correlation between different intrinsic and extrinsic parameters of the camera, e.g. the coupling between focal length and camera position. The camera motion (both intrinsic and extrinsic parameters) is modelled as a random walk process, where the inter-frame motions are assumed to be independently normally distributed. The proposed scheme is demonstrated on both simulated and real data showing the increased performance.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130132904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Caustics of catadioptric cameras","authors":"R. Swaminathan, M. Grossberg, S. Nayar","doi":"10.1109/ICCV.2001.937581","DOIUrl":"https://doi.org/10.1109/ICCV.2001.937581","url":null,"abstract":"Conventional vision systems and algorithms assume the camera to have a single viewpoint. However, sensors need not always maintain a single viewpoint. For instance, an incorrectly aligned system could cause non-single viewpoints. Also, systems could be designed to specifically deviate from a single viewpoint to trade-off image characteristics such as resolution and field of view. In these cases, the locus of viewpoints forms what is called a caustic. In this paper, we present an in-depth analysis of caustics of catadioptric cameras with conic reflectors. Properties of caustics with respect to field of view and resolution are presented. Finally, we present ways to calibrate conic catadioptric systems and estimate their caustics from known camera motion.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127083926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}