{"title":"Fast vehicle detection with probabilistic feature grouping and its application to vehicle tracking","authors":"Zuwhan Kim, Jitendra Malik","doi":"10.1109/ICCV.2003.1238392","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238392","url":null,"abstract":"Generating vehicle trajectories from video data is an important application of ITS (intelligent transportation systems). We introduce a new tracking approach which uses model-based 3-D vehicle detection and description algorithm. Our vehicle detection and description algorithm is based on a probabilistic line feature grouping, and it is faster (by up to an order of magnitude) and more flexible than previous image-based algorithms. We present the system implementation and the vehicle detection and tracking results.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128491036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel approach for texture shape recovery","authors":"Jing Wang, Kristin J. Dana","doi":"10.1109/ICCV.2003.1238650","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238650","url":null,"abstract":"In vision and graphics, there is a sustained interest in capturing accurate 3D shape with various scanning devices. However, the resulting geometric representation is only part of the story. Surface texture of real objects is also an important component of the representation and fine-scale surface geometry such as surface markings, roughness, and imprints, are essential in highly realistic rendering and accurate prediction. We present a novel approach for measuring the fine-scale surface shape of specular surfaces using a curved mirror to view multiple angles in a single image. A distinguishing aspect of our method is that it is designed for specular surfaces, unlike many methods (e.g. laser scanning) which cannot handle highly specular objects. Also, the spatial resolution is very high so that it can resolve very small surface details that are beyond the resolution of standard devices. Furthermore, our approach incorporates the simultaneous use of a bidirectional texture measurement method, so that spatially varying bidirectional reflectance is measured at the same time as surface shape.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124261514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multi-scale generative model for animate shapes and parts","authors":"A. Dubinskiy, Song-Chun Zhu","doi":"10.1109/ICCV.2003.1238350","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238350","url":null,"abstract":"We present a multiscale generative model for representing animate shapes and extracting meaningful parts of objects. The model assumes that animate shapes (2D simple dosed curves) are formed by a linear superposition of a number of shape bases. These shape bases resemble the multiscale Gabor bases in image pyramid representation, are well localized in both spatial and frequency domains, and form an over-complete dictionary. This model is simpler than the popular B-spline representation since it does not engage a domain partition. Thus it eliminates the interference between adjacent B-spline bases, and becomes a true linear additive model. We pursue the bases by reconstructing the shape in a coarse-to-fine procedure through curve evolution. These shape bases are further organized in a tree-structure, where the bases in each subtree sum up to an intuitive part of the object. To build probabilistic model for a class of objects, we propose a Markov random field model at each level of the tree representation to account for the spatial relationship between bases. Thus the final model integrates a Markov tree (generative) model over scales and a Markov random field over space. We adopt EM-type algorithm for learning the meaningful parts for a shape class, and show some results on shape synthesis.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114023803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Outlier correction in image sequences for the affine camera","authors":"D. Huynh, R. Hartley, A. Heyden","doi":"10.1109/ICCV.2003.1238400","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238400","url":null,"abstract":"It is widely known that, for the affine camera model, both shape and motion can be factorized directly from the so-called image measurement matrix constructed from image point coordinates. The ability to extract both shape and motion from this matrix by a single SVD operation makes this shape-from-motion approach attractive; however, it can not deal with missing feature points and, in the presence of outliers, a direct SVD to the matrix would yield highly unreliable shape and motion components. Here, we present an outlier correction scheme that iteratively updates the elements of the image measurement matrix. The magnitude and sign of the update to each element is dependent upon the residual robustly estimated in each iteration. The result is that outliers are corrected and retained, giving improved reconstruction and smaller reprojection errors. Our iterative outlier correction scheme has been applied to both synthesized and real video sequences. The results obtained are remarkably good.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124095401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using temporal coherence to build models of animals","authors":"Deva Ramanan, D. Forsyth","doi":"10.1109/ICCV.2003.1238364","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238364","url":null,"abstract":"We describe a system that can build appearance models of animals automatically from a video sequence of the relevant animal with no explicit supervisory information. The video sequence need not have any form of special background. Animals are modeled as a 2D kinematic chain of rectangular segments, where the number of segments and the topology of the chain are unknown. The system detects possible segments, clusters segments whose appearance is coherent over time, and then builds a spatial model of such segment clusters. The resulting representation of the spatial configuration of the animal in each frame can be seen either as a track - in which case the system described should be viewed as a generalized tracker, that is capable of modeling objects while tracking them - or as the source of an appearance model which can be used to build detectors for the particular animal. This is because knowing a video sequence is temporally coherent - i.e. that a particular animal is present through the sequence - is a strong supervisory signal. The method is shown to be successful as a tracker on video sequences of real scenes showing three different animals. For the same reason it is successful as a tracker, the method results in detectors that can be used to find each animal fairly reliably within the Corel collection of images.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117025042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Jehan-Besson, M. Barlaud, G. Aubert, O. Faugeras
{"title":"Shape gradients for histogram segmentation using active contours","authors":"S. Jehan-Besson, M. Barlaud, G. Aubert, O. Faugeras","doi":"10.1109/ICCV.2003.1238375","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238375","url":null,"abstract":"We consider the problem of image segmentation using active contours through the minimization of an energy criterion involving both region and boundary functionals. These functionals are derived through a shape derivative approach instead of classical calculus of variation. The equations can be elegantly derived without converting the region integrals into boundary integrals. From the derivative, we deduce the evolution equation of an active contour that makes it evolve towards a minimum of the criterion. We focus more particularly on statistical features globally attached to the region and especially to the probability density functions of image features such as the color histogram of a region. A theoretical framework is set for the minimization of the distance between two histograms for matching or tracking purposes. An application of this framework to the segmentation of color histograms in video sequences is then proposed. We briefly describe our numerical scheme and show some experimental results.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122624233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial expression understanding in image sequences using dynamic and active visual information fusion","authors":"Yongmian Zhang, Q. Ji","doi":"10.1109/ICCV.2003.1238640","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238640","url":null,"abstract":"This paper explores the use of multisensory information fusion technique with dynamic Bayesian networks (DBNs) for modeling and understanding the temporal behaviors of facial expressions in image sequences. Our approach to the facial expression understanding lies in a probabilistic framework by integrating the DBNs with the facial action units (AUs) from psychological view. The DBNs provide a coherent and unified hierarchical probabilistic framework to represent spatial and temporal information related to facial expressions, and to actively select the most informative visual cues from the available information to minimize the ambiguity in recognition. The recognition of facial expressions is accomplished by fusing not only from the current visual observations, but also from the previous visual evidences. Consequently, the recognition becomes more robust and accurate through modeling the temporal behavior of facial expressions. Experimental results demonstrate that our approach is more admissible for facial expression analysis in image sequences.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123266578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic video summarization by graph modeling","authors":"C. Ngo, Yu-Fei Ma, HongJiang Zhang","doi":"10.1109/ICCV.2003.1238320","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238320","url":null,"abstract":"We propose a unified approach for summarization based on the analysis of video structures and video highlights. Our approach emphasizes both the content balance and perceptual quality of a summary. Normalized cut algorithm is employed to globally and optimally partition a video into clusters. A motion attention model based on human perception is employed to compute the perceptual quality of shots and clusters. The clusters, together with the computed attention values, form a temporal graph similar to Markov chain that inherently describes the evolution and perceptual importance of video clusters. In our application, the flow of a temporal graph is utilized to group similar clusters into scenes, while the attention values are used as guidelines to select appropriate subshots in scenes for summarization.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128718155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fragmentation in the vision of scenes","authors":"J. Geusebroek, A. Smeulders","doi":"10.1109/ICCV.2003.1238326","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238326","url":null,"abstract":"Natural images are highly structured in their spatial configuration. Where one would expect a different spatial distribution for every image, as each image has a different spatial layout, we show that the spatial statistics of recorded images can be explained by a single process of sequential fragmentation. The observation by a resolution limited sensory system turns out to have a profound influence on the observed statistics of natural images. The power-law and normal distribution represent the extreme cases of sequential fragmentation. Between these two extremes, spatial detail statistics deform from power-law to normal through the Weibull type distribution as receptive field size increases relative to image detail size.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124594631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph partition by Swendsen-Wang cuts","authors":"Adrian Barbu, Song-Chun Zhu","doi":"10.1109/ICCV.2003.1238362","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238362","url":null,"abstract":"Vision tasks, such as segmentation, grouping, recognition, can be formulated as graph partition problems. The recent literature witnessed two popular graph cut algorithms: the Ncut using spectral graph analysis and the minimum-cut using the maximum flow algorithm. We present a third major approach by generalizing the Swendsen-Wang method - a well celebrated algorithm in statistical mechanics. Our algorithm simulates ergodic, reversible Markov chain jumps in the space of graph partitions to sample a posterior probability. At each step, the algorithm splits, merges, or regroups a sizable subgraph, and achieves fast mixing at low temperature enabling a fast annealing procedure. Experiments show it converges in 2-30 seconds on a PC for image segmentation. This is 400 times faster than the single-site update Gibbs sampler, and 20-40 times faster than the DDMCMC algorithm. The algorithm can optimize over the number of models and works for general forms of posterior probabilities, so it is more general than the existing graph cut approaches.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"178 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120883301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}