{"title":"Structure Preserving Object Tracking","authors":"Lu Zhang, L. Maaten","doi":"10.1109/CVPR.2013.240","DOIUrl":"https://doi.org/10.1109/CVPR.2013.240","url":null,"abstract":"Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation of our structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"64 1","pages":"1838-1845"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89934958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras","authors":"Ju Shen, S. Cheung","doi":"10.1109/CVPR.2013.157","DOIUrl":"https://doi.org/10.1109/CVPR.2013.157","url":null,"abstract":"The recent popularity of structured-light depth sensors has enabled many new applications from gesture-based user interface to 3D reconstructions. The quality of the depth measurements of these systems, however, is far from perfect. Some depth values can have significant errors, while others can be missing altogether. The uncertainty in depth measurements among these sensors can significantly degrade the performance of any subsequent vision processing. In this paper, we propose a novel probabilistic model to capture various types of uncertainties in the depth measurement process among structured-light systems. The key to our model is the use of depth layers to account for the differences between foreground objects and background scene, the missing depth value phenomenon, and the correlation between color and depth channels. The depth layer labeling is solved as a maximum a-posteriori estimation problem, and a Markov Random Field attuned to the uncertainty in measurements is used to spatially smooth the labeling process. Using the depth-layer labels, we propose a depth correction and completion algorithm that outperforms other techniques in the literature.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"1187-1194"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76838641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial Inference Machines","authors":"Roman Shapovalov, D. Vetrov, Pushmeet Kohli","doi":"10.1109/CVPR.2013.384","DOIUrl":"https://doi.org/10.1109/CVPR.2013.384","url":null,"abstract":"This paper addresses the problem of semantic segmentation of 3D point clouds. We extend the inference machines framework of Ross et al. by adding spatial factors that model mid-range and long-range dependencies inherent in the data. The new model is able to account for semantic spatial context. During training, our method automatically isolates and retains factors modelling spatial dependencies between variables that are relevant for achieving higher prediction accuracy. We evaluate the proposed method by using it to predict 17-category semantic segmentations on sets of stitched Kinect scans. Experimental results show that the spatial dependencies learned by our method significantly improve the accuracy of segmentation. They also show that our method outperforms the existing segmentation technique of Koppula et al.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"85 1","pages":"2985-2992"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77020217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Capturing Layers in Image Collections with Componential Models: From the Layered Epitome to the Componential Counting Grid","authors":"A. Perina, N. Jojic","doi":"10.1109/CVPR.2013.71","DOIUrl":"https://doi.org/10.1109/CVPR.2013.71","url":null,"abstract":"Recently, the Counting Grid (CG) model was developed to represent each input image as a point in a large grid of feature counts. This latent point is a corner of a window of grid points which are all uniformly combined to match the (normalized) feature counts in the image. Being a bag of word model with spatial layout in the latent space, the CG model has superior handling of field of view changes in comparison to other bag of word models, but with the price of being essentially a mixture, mapping each scene to a single window in the grid. In this paper we introduce a family of componential models, dubbed the Componential Counting Grid, whose members represent each input image by multiple latent locations, rather than just one. In this way, we make a substantially more flexible admixture model which captures layers or parts of images and maps them to separate windows in a Counting Grid. We tested the models on scene and place classification where their componential nature helped to extract objects, to capture parallax effects, thus better fitting the data and outperforming Counting Grids and Latent Dirichlet Allocation, especially on sequences taken with wearable cameras.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"72 8 1","pages":"500-507"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78055576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning Shape Priors for Object Segmentation","authors":"Fei Chen, Huimin Yu, Roland Hu, Xunxun Zeng","doi":"10.1109/CVPR.2013.244","DOIUrl":"https://doi.org/10.1109/CVPR.2013.244","url":null,"abstract":"In this paper we introduce a new shape-driven approach for object segmentation. Given a training set of shapes, we first use deep Boltzmann machine to learn the hierarchical architecture of shape priors. This learned hierarchical architecture is then used to model shape variations of global and local structures in an energetic form. Finally, it is applied to data-driven variational methods to perform object extraction of corrupted data based on shape probabilistic representation. Experiments demonstrate that our model can be applied to dataset of arbitrary prior shapes, and can cope with image noise and clutter, as well as partial occlusions.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"125 1","pages":"1870-1877"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78471559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Rubinstein, Armand Joulin, J. Kopf, Ce Liu
{"title":"Unsupervised Joint Object Discovery and Segmentation in Internet Images","authors":"Michael Rubinstein, Armand Joulin, J. Kopf, Ce Liu","doi":"10.1109/CVPR.2013.253","DOIUrl":"https://doi.org/10.1109/CVPR.2013.253","url":null,"abstract":"We present a new unsupervised algorithm to discover and segment out common objects from large and diverse image collections. In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search. The key insight to our algorithm is that common object patterns should be salient within each image, while being sparse with respect to smooth transformations across other images. We propose to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others. We performed extensive numerical evaluation on established co-segmentation datasets, as well as several new datasets generated using Internet search. Our approach is able to effectively segment out the common object for diverse object categories, while naturally identifying images where the common object is not present.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"69 1","pages":"1939-1946"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76064512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Dominant and Anomalous Behavior Detection in Videos","authors":"M. J. Roshtkhari, M. Levine","doi":"10.1109/CVPR.2013.337","DOIUrl":"https://doi.org/10.1109/CVPR.2013.337","url":null,"abstract":"We present a novel approach for video parsing and simultaneous online learning of dominant and anomalous behaviors in surveillance videos. Dominant behaviors are those occurring frequently in videos and hence, usually do not attract much attention. They can be characterized by different complexities in space and time, ranging from a scene background to human activities. In contrast, an anomalous behavior is defined as having a low likelihood of occurrence. We do not employ any models of the entities in the scene in order to detect these two kinds of behaviors. In this paper, video events are learnt at each pixel without supervision using densely constructed spatio-temporal video volumes. Furthermore, the volumes are organized into large contextual graphs. These compositions are employed to construct a hierarchical codebook model for the dominant behaviors. By decomposing spatio-temporal contextual information into unique spatial and temporal contexts, the proposed framework learns the models of the dominant spatial and temporal events. Thus, it is ultimately capable of simultaneously modeling high-level behaviors as well as low-level spatial, temporal and spatio-temporal pixel level changes.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"488 1","pages":"2611-2618"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76108053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expanded Parts Model for Human Attribute and Action Recognition in Still Images","authors":"Gaurav Sharma, F. Jurie, C. Schmid","doi":"10.1109/CVPR.2013.90","DOIUrl":"https://doi.org/10.1109/CVPR.2013.90","url":null,"abstract":"We propose a new model for recognizing human attributes (e.g. wearing a suit, sitting, short hair) and actions (e.g. running, riding a horse) in still images. The proposed model relies on a collection of part templates which are learnt discriminatively to explain specific scale-space locations in the images (in human centric coordinates). It avoids the limitations of highly structured models, which consist of a few (i.e. a mixture of) 'average' templates. To learn our model, we propose an algorithm which automatically mines out parts and learns corresponding discriminative templates with their respective locations from a large number of candidate parts. We validate the method on recent challenging datasets: (i) Willow 7 actions [7], (ii) 27 Human Attributes (HAT) [25], and (iii) Stanford 40 actions [37]. We obtain convincing qualitative and state-of-the-art quantitative results on the three datasets.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"18 1","pages":"652-659"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75713129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Model and Simple Algorithms for Multi-label Mumford-Shah Problems","authors":"Byung-Woo Hong, Zhaojin Lu, G. Sundaramoorthi","doi":"10.1109/CVPR.2013.161","DOIUrl":"https://doi.org/10.1109/CVPR.2013.161","url":null,"abstract":"In this work, we address the multi-label Mumford-Shah problem, i.e., the problem of jointly estimating a partitioning of the domain of the image, and functions defined within regions of the partition. We create algorithms that are efficient, robust to undesirable local minima, and are easy-to-implement. Our algorithms are formulated by slightly modifying the underlying statistical model from which the multi-label Mumford-Shah functional is derived. The advantage of this statistical model is that the underlying variables: the labels and the functions are less coupled than in the original formulation, and the labels can be computed from the functions with more global updates. The resulting algorithms can be tuned to the desired level of locality of the solution: from fully global updates to more local updates. We demonstrate our algorithm on two applications: joint multi-label segmentation and denoising, and joint multi-label motion segmentation and flow estimation. We compare to the state-of-the-art in multi-label Mumford-Shah problems and show that we achieve more promising results.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"8 1","pages":"1219-1226"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72977621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brandon M. Smith, Li Zhang, Jonathan Brandt, Zhe L. Lin, Jianchao Yang
{"title":"Exemplar-Based Face Parsing","authors":"Brandon M. Smith, Li Zhang, Jonathan Brandt, Zhe L. Lin, Jianchao Yang","doi":"10.1109/CVPR.2013.447","DOIUrl":"https://doi.org/10.1109/CVPR.2013.447","url":null,"abstract":"In this work, we propose an exemplar-based face image segmentation algorithm. We take inspiration from previous works on image parsing for general scenes. Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. We also compare our segmentation results with contour-based face alignment results, that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. Our algorithm compares favorably with all previous works on all datasets evaluated.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"85 1","pages":"3484-3491"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73156453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}