{"title":"Spatio-temporal clustering of probabilistic region trajectories","authors":"Fabio Galasso, M. Iwasaki, K. Nobori, R. Cipolla","doi":"10.1109/ICCV.2011.6126438","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126438","url":null,"abstract":"We propose a novel model for the spatio-temporal clustering of trajectories based on motion, which applies to challenging street-view video sequences of pedestrians captured by a mobile camera. A key contribution of our work is the introduction of novel probabilistic region trajectories, motivated by the non-repeatability of segmentation of frames in a video sequence. Hierarchical image segments are obtained by using a state-of-the-art hierarchical segmentation algorithm, and connected from adjacent frames in a directed acyclic graph. The region trajectories and measures of confidence are extracted from this graph using a dynamic programming-based optimisation. Our second main contribution is a Bayesian framework with a twofold goal: to learn the optimal, in a maximum likelihood sense, Random Forests classifier of motion patterns based on video features, and construct a unique graph from region trajectories of different frames, lengths and hierarchical levels. Finally, we demonstrate the use of Isomap for effective spatio-temporal clustering of the region trajectories of pedestrians. We support our claims with experimental results on new and existing challenging video sequences.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73565378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikel D. Rodriguez, I. Laptev, Josef Sivic, Jean-Yves Audibert
{"title":"Density-aware person detection and tracking in crowds","authors":"Mikel D. Rodriguez, I. Laptev, Josef Sivic, Jean-Yves Audibert","doi":"10.1109/ICCV.2011.6126526","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126526","url":null,"abstract":"We address the problem of person detection and tracking in crowded video scenes. While the detection of individual objects has been improved significantly over the recent years, crowd scenes remain particularly challenging for the detection and tracking tasks due to heavy occlusions, high person densities and significant variation in people's appearance. To address these challenges, we propose to leverage information on the global structure of the scene and to resolve all detections jointly. In particular, we explore constraints imposed by the crowd density and formulate person detection as the optimization of a joint energy function combining crowd density estimation and the localization of individual people. We demonstrate how the optimization of such an energy function significantly improves person detection and tracking in crowds. We validate our approach on a challenging video dataset of crowded scenes.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72927417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An adaptive coupled-layer visual model for robust visual tracking","authors":"Luka Cehovin, M. Kristan, A. Leonardis","doi":"10.1109/ICCV.2011.6126390","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126390","url":null,"abstract":"This paper addresses the problem of tracking objects which undergo rapid and significant appearance changes. We propose a novel coupled-layer visual model that combines the target's global and local appearance. The local layer in this model is a set of local patches that geometrically constrain the changes in the target's appearance. This layer probabilistically adapts to the target's geometric deformation, while its structure is updated by removing and adding the local patches. The addition of the patches is constrained by the global layer that probabilistically models target's global visual properties such as color, shape and apparent local motion. The global visual properties are updated during tracking using the stable patches from the local layer. By this coupled constraint paradigm between the adaptation of the global and the local layer, we achieve a more robust tracking through significant appearance changes. Indeed, the experimental results on challenging sequences confirm that our tracker outperforms the related state-of-the-art trackers by having smaller failure rate as well as better accuracy.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75495119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trajectory reconstruction from non-overlapping surveillance cameras with relative depth ordering constraints","authors":"B. Micusík","doi":"10.1109/ICCV.2011.6126334","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126334","url":null,"abstract":"We present a method for reconstructing a trajectory of an object moving in front of non-overlapping fully or partially calibrated cameras. The non-overlapping setup turns that problem ill-posed as no point correspondences can be established which are necessary for the well known point triangulation. The proposed solution instead builds on the assumption of trajectory smoothness and depth ordering prior information. We propose a novel formulation with a consistent minimization criterion and a way to utilize the depth ordering prior reflected by the size change of a bounding box associated to an image point being tracked. Reconstructing trajectory minimizing the trajectory smoothness, its re-projection error and employing the depth priors is casted as the Second Order Cone Program yielding a global optimum. The new formulation together with the proposed depth prior significantly improves the trajectory reconstruction in sense of accuracy and topology, and speeds up the solver. Synthetic and real experiments validate the feasibility of the proposed approach.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77746341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Hitomi, Jinwei Gu, Mohit Gupta, T. Mitsunaga, S. Nayar
{"title":"Video from a single coded exposure photograph using a learned over-complete dictionary","authors":"Y. Hitomi, Jinwei Gu, Mohit Gupta, T. Mitsunaga, S. Nayar","doi":"10.1109/ICCV.2011.6126254","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126254","url":null,"abstract":"Cameras face a fundamental tradeoff between the spatial and temporal resolution - digital still cameras can capture images with high spatial resolution, but most high-speed video cameras suffer from low spatial resolution. It is hard to overcome this tradeoff without incurring a significant increase in hardware costs. In this paper, we propose techniques for sampling, representing and reconstructing the space-time volume in order to overcome this tradeoff. Our approach has two important distinctions compared to previous works: (1) we achieve sparse representation of videos by learning an over-complete dictionary on video patches, and (2) we adhere to practical constraints on sampling scheme which is imposed by architectures of present image sensor devices. Consequently, our sampling scheme can be implemented on image sensors by making a straightforward modification to the control unit. To demonstrate the power of our approach, we have implemented a prototype imaging system with per-pixel coded exposure control using a liquid crystal on silicon (LCoS) device. Using both simulations and experiments on a wide range of scenes, we show that our method can effectively reconstruct a video from a single image maintaining high spatial resolution.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80006886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Random ensemble metrics for object recognition","authors":"Tatsuo Kozakaya, S. Ito, Susumu Kubota","doi":"10.1109/ICCV.2011.6126466","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126466","url":null,"abstract":"This paper presents a novel and generic approach for metric learning, random ensemble metrics (REMetric). To improve generalization performance, we introduce the concept of ensemble learning to the metric learning scheme. Unlike previous methods, our method does not optimize the global objective function for the whole training data. It learns multiple discriminative projection vectors obtained from linear support vector machines (SVM) using randomly subsampled training data. The final metric matrix is then obtained by integrating these vectors. As a result of using SVM, the learned metric has an excellent scalability for the dimensionality of features. Therefore, it does not require any prior dimensionality reduction techniques such as PCA. Moreover, our method allows us to unify dimensionality reduction and metric learning by controlling the number of the projection vectors. We demonstrate through experiments, that our method can avoid overfitting even though a relatively small number of training data is provided. The experiments are performed with three different datasets; the Viewpoint Invariant Pedestrian Recognition (VIPeR) dataset, the Labeled Face in the Wild (LFW) dataset and the Oxford 102 category flower dataset. The results show that our method achieves equivalent or superior performance compared to existing state-of-the-art metric learning methods.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80199053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Isotonic CCA for sequence alignment and activity recognition","authors":"Shahriar Shariat, V. Pavlovic","doi":"10.1109/ICCV.2011.6126545","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126545","url":null,"abstract":"This paper presents an approach for sequence alignment based on canonical correlation analysis(CCA). We show that a novel set of constraints imposed on traditional CCA leads to canonical solutions with the time warping property, i.e., non-decreasing monotonicity in time. This formulation generalizes the more traditional dynamic time warping (DTW) solutions to cases where the alignment is accomplished on arbitrary subsequence segments, optimally determined from data, instead on individual sequence samples. We then introduce a robust and efficient algorithm to find such alignments using non-negative least squares reductions. Experimental results show that this new method, when applied to MOCAP activity recognition problems, can yield improved recognition accuracy.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79134784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust unsupervised motion pattern inference from video and applications","authors":"Xuemei Zhao, G. Medioni","doi":"10.1109/ICCV.2011.6126308","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126308","url":null,"abstract":"We propose an unsupervised learning framework to infer motion patterns in videos and in turn use them to improve tracking of moving objects in sequences from static cameras. Based on tracklets, we use a manifold learning method Tensor Voting to infer the local geometric structures in (x, y) space, and embed tracklet points into (x, y, θ) space, where θ represents motion direction. In this space, points automatically form intrinsic manifold structures, each of which corresponds to a motion pattern. To define each group, a novel robustmanifold grouping algorithm is proposed. Tensor Voting is performed to provide multiple geometric cues which formulate multiple similarity kernels between any pair of points, and a spectral clustering technique is used in this multiple kernel setting. The grouping algorithm achieves better performance than state-of-the-art methods in our applications. Extracted motion patterns can then be used as a prior to improve the performance of any object tracker. It is especially useful to reduce false alarms and ID switches. Experiments are performed on challenging real-world sequences, and a quantitative analysis of the results shows the framework effectively improves state-of-the-art tracker.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81533417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Annotator rationales for visual recognition","authors":"Jeff Donahue, K. Grauman","doi":"10.1109/ICCV.2011.6126394","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126394","url":null,"abstract":"Traditional supervised visual learning simply asks annotators “what” label an image should have. We propose an approach for image classification problems requiring subjective judgment that also asks “why”, and uses that information to enrich the learned model. We develop two forms of visual annotator rationales: in the first, the annotator highlights the spatial region of interest he found most influential to the label selected, and in the second, he comments on the visual attributes that were most important. For either case, we show how to map the response to synthetic contrast examples, and then exploit an existing large-margin learning technique to refine the decision boundary accordingly. Results on multiple scene categorization and human attractiveness tasks show the promise of our approach, which can more accurately learn complex categories with the explanations behind the label choices.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85005383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fisher Discrimination Dictionary Learning for sparse representation","authors":"Meng Yang, Lei Zhang, Xiangchu Feng, D. Zhang","doi":"10.1109/ICCV.2011.6126286","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126286","url":null,"abstract":"Sparse representation based classification has led to interesting image recognition results, while the dictionary used for sparse coding plays a key role in it. This paper presents a novel dictionary learning (DL) method to improve the pattern classification performance. Based on the Fisher discrimination criterion, a structured dictionary, whose dictionary atoms have correspondence to the class labels, is learned so that the reconstruction error after sparse coding can be used for pattern classification. Meanwhile, the Fisher discrimination criterion is imposed on the coding coefficients so that they have small within-class scatter but big between-class scatter. A new classification scheme associated with the proposed Fisher discrimination DL (FDDL) method is then presented by using both the discriminative information in the reconstruction error and sparse coding coefficients. The proposed FDDL is extensively evaluated on benchmark image databases in comparison with existing sparse representation and DL based classification methods.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85072529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}