{"title":"Spatiotemporal KSVD Dictionary Learning for Online Multi-target Tracking","authors":"H. Manh, G. Alaghband","doi":"10.1109/CRV.2018.00030","DOIUrl":"https://doi.org/10.1109/CRV.2018.00030","url":null,"abstract":"In this paper, we present a new spatiotemporal discriminative KSVD dictionary algorithm (STKSVD) for learning target appearance in online multi-target tracking system. Different from other classification/recognition tasks (e.g. face, image recognition), learning target's appearance in online multi-target tracking is impacted by factors such as: posture/articulation changes, partial occlusion by background scene or other targets, background changes (human detection bounding box covers both human parts and part of the scene), etc. However, we observe that these variations occur gradually relative to spatial and temporal dynamics. We characterize the spatial and temporal information between target's samples through a new STKSVD appearance learning algorithm to better discriminate targets. Our STKSVD method is able to learn discriminative sparse code, linear classifier parameters, and minimize reconstruction error in single optimization system. Our appearance learning algorithm and tracking framework employs two different methods of calculating appearance similarity score in each stage of a two-stage association: a linear classifier in the first stage, and minimum residual errors in the second stage. The results tested using 2DMOT2015 dataset and its public Aggregated Channel Features (ACF) human detection for all comparisons show that our method outperforms the existing related learning methods.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127132804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Unsynchronized Unstructured Light","authors":"Chaima El Asmi, S. Roy","doi":"10.1109/CRV.2018.00046","DOIUrl":"https://doi.org/10.1109/CRV.2018.00046","url":null,"abstract":"This paper proposes a new approach in structured light correspondence to alleviate the camera-projector synchronization problem. Until now, great care was required to make sure that each camera image was corresponding exactly the correct pattern in the sequence. This was difficult to achieve with low-cost hardware or large size installations. In our method, the projector sends a constant video loop of a selected number of unstructured light patterns at a high frame rate (30 to 60 fps for common hardware), which are captured by a camera without any form of synchronization. The only constraint to satisfy is that the camera and projector frame rates are known. The matching process not only recovers the correct pattern sequence, but is impervious to partial exposures of consecutive patterns as well as rolling shutter effects.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129322351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Scene Models for Visual Localization under Large Viewpoint Changes","authors":"J. Li, Zhaoqi Xu, D. Meger, G. Dudek","doi":"10.1109/CRV.2018.00033","DOIUrl":"https://doi.org/10.1109/CRV.2018.00033","url":null,"abstract":"We propose an approach for camera pose estimation under large viewpoint changes using only 2D RGB images. This enables a mobile robot to relocalize itself with respect to a previously-visited scene when seeing it again from a completely new vantage point. In order to overcome large appearance changes, we integrate a variety of cues, including object detections, vanishing points, structure from motion, and object-to-object context in order to constrain the camera geometry, while simultaneously estimating the 3D pose of covisible objects represented as bounding cuboids. We propose an efficient sampling-based approach that quickly cuts down the high-dimensional search space, and a robust correspondence algorithm that matches covisible objects via inter-object spatial relationships. We validate our approach using the publicly available Sun3D dataset, in which we demonstrate the ability to handle camera translations of up to 5.9 meters and camera rotations of up to 110 degrees.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116041074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Manifold Geometry with Fast Automatic Derivatives and Coordinate Frame Semantics Checking in C++","authors":"Leonid Koppel, Steven L. Waslander","doi":"10.1109/CRV.2018.00027","DOIUrl":"https://doi.org/10.1109/CRV.2018.00027","url":null,"abstract":"Computer vision and robotics problems often require representation and estimation of poses on the SE(3) manifold. Developers of algorithms that must run in real time face several time-consuming programming tasks, including deriving and computing analytic derivatives and avoiding mathematical errors when handling poses in multiple coordinate frames. To support rapid and error-free development, we present wave_geometry, a C++ manifold geometry library with two key contributions: expression template-based automatic differentiation and compile-time enforcement of coordinate frame semantics. We contrast the library with existing open source packages and show that it can evaluate Jacobians in forward and reverse mode with little to no runtime overhead compared to hand-coded derivatives. The library is available at https://github.com/wavelab/wave_geometry.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124172087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Disparity Filtering with 3D Convolutional Neural Networks","authors":"W. Mao, Minglun Gong","doi":"10.1109/CRV.2018.00042","DOIUrl":"https://doi.org/10.1109/CRV.2018.00042","url":null,"abstract":"Stereo matching is an ill-posed problem and hence the disparity maps generated are often inaccurate and noisy. To alleviate the problem, a number of approaches were proposed to output accurate disparity values for selected pixels only. Instead of designing another disparity optimization method for sparse disparity matching, we present a novel disparity filtering step that detects and removes inaccurate matches. Based on 3D convolutional neutral networks, our detector is trained directly on 3D matching cost volumes and hence work with different matching cost generation approaches. The experimental results show that it can effectively filter out mismatches while preserving the accurate ones. As a result, combining our approach with the simplest Winner-Take-All optimization will lead to a better performance than most existing sparse stereo matching algorithms on the Middlebury Stereo Evaluation site.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124705807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rahat Yasir, M. Eramian, I. Stavness, S. Shirtliffe, H. Duddu
{"title":"Data-Driven Multispectral Image Registration","authors":"Rahat Yasir, M. Eramian, I. Stavness, S. Shirtliffe, H. Duddu","doi":"10.1109/CRV.2018.00040","DOIUrl":"https://doi.org/10.1109/CRV.2018.00040","url":null,"abstract":"Multispectral imaging is widely used in remote sensing applications from UAVs and ground-based platforms. Multispectral cameras often use a physically different camera for each wavelength causing misalignment in the images for different imaging bands. This misalignment must be corrected prior to concurrent multi-band image analysis. The traditional approach for multispectral image registration process is to select a target channel and register all other image channels to the target. There is no objective evidence-based method to select a target. The possibility of registration to some intermediate channel to the target is not usually considered, but could be beneficial if there is no target channel for which direct registration performs well for every other channel. In this paper, we propose an automatic data-driven multispectral image registration framework that determines a target channel, and possible intermediate registration steps based on the assumptions that 1) some reasonable minimum number of control points correspondences between two channels is needed to ensure a low-error registration; and 2) a greater number of such correspondences generally results in lower registration error. Our prototype is tested on three multispectral datasets captured with UAV-mounted multispectral cameras. The resulting registration schemes had more control point correspondences on average than the traditional register-all-to-one-target-channel approach in all of our experiments. For most channels in our three datasets, our registration schemes produced lower back-projection error than the direct-to-target-channel based registration approach.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121347556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Surface-Based GICP","authors":"M. Vlaminck, H. Luong, W. Philips","doi":"10.1109/CRV.2018.00044","DOIUrl":"https://doi.org/10.1109/CRV.2018.00044","url":null,"abstract":"In this paper we present an extension of the Generalized ICP algorithm for the registration of point clouds for use in lidar-based SLAM applications. As opposed to the plane-to-plane cost function, which assumes that each point set is locally planar, we propose to incorporate additional information on the underlying surface into the GICP process. Doing so, we are able to deal better with the artefacts that are typically present in lidar point clouds, including an inhomogeneous and sparse point density, noise and missing data. Experiments on lidar sequences of the KITTI benchmark demonstrate that we are able to substantially reduce the positional error compared to the original GICP algorithm.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132883883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Evaluation of Deep CNN Baselines for Scene-Independent Person Re-identification","authors":"P. Marchwica, Michael Jamieson, P. Siva","doi":"10.1109/CRV.2018.00049","DOIUrl":"https://doi.org/10.1109/CRV.2018.00049","url":null,"abstract":"In recent years, a variety of proposed methods based on deep convolutional neural networks (CNNs) have improved the state of the art for large-scale person re-identification (ReID). While a large number of optimizations and network improvements have been proposed, there has been relatively little evaluation of the influence of training data and baseline network architecture. In particular, it is usually assumed either that networks are trained on labeled data from the deployment location (scene-dependent), or else adapted with unlabeled data, both of which complicate system deployment. In this paper, we investigate the feasibility of achieving scene-independent person ReID by forming a large composite dataset for training. We present an in-depth comparison of several CNN baseline architectures for both scene-dependent and scene-independent ReID, across a range of training dataset sizes. We show that scene-independent ReID can produce leading-edge results, competitive with unsupervised domain adaption techniques. Finally, we introduce a new dataset for comparing within-camera and across-camera person ReID.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133844818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Livne, L. Sigal, Marcus A. Brubaker, David J. Fleet
{"title":"Walking on Thin Air: Environment-Free Physics-Based Markerless Motion Capture","authors":"M. Livne, L. Sigal, Marcus A. Brubaker, David J. Fleet","doi":"10.1109/CRV.2018.00031","DOIUrl":"https://doi.org/10.1109/CRV.2018.00031","url":null,"abstract":"We propose a generative approach to physics-based motion capture. Unlike prior attempts to incorporate physics into tracking that assume the subject and scene geometry are calibrated and known a priori, our approach is automatic and online. This distinction is important since calibration of the environment is often difficult, especially for motions with props, uneven surfaces, or outdoor scenes. The use of physics in this context provides a natural framework to reason about contact and the plausibility of recovered motions. We propose a fast data-driven parametric body model, based on linear-blend skinning, which decouples deformations due to pose, anthropometrics and body shape. Pose (and shape) parameters are estimated using robust ICP optimization with physics-based dynamic priors that incorporate contact. Contact is estimated from torque trajectories and predictions of which contact points were active. To our knowledge, this is the first approach to take physics into account without explicit a priori knowledge of the environment or body dimensions. We demonstrate effective tracking from a noisy single depth camera, improving on state-of-the-art results quantitatively and producing better qualitative results, reducing visual artifacts like foot-skate and jitter.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"353 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122763811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Pyramid CNN for Dense-Leaves Segmentation","authors":"Daniel Morris","doi":"10.1109/CRV.2018.00041","DOIUrl":"https://doi.org/10.1109/CRV.2018.00041","url":null,"abstract":"Automatic detection and segmentation of overlapping leaves in dense foliage can be a difficult task, particularly for leaves with strong textures and high occlusions. We present Dense-Leaves, an image dataset with ground truth segmentation labels that can be used to train and quantify algorithms for leaf segmentation in the wild. We also propose a pyramid convolutional neural network with multi-scale predictions that detects and discriminates leaf boundaries from interior textures. Using these detected boundaries, closed-contour boundaries around individual leaves are estimated with a watershed-based algorithm. The result is an instance segmenter for dense leaves. Promising segmentation results for leaves in dense foliage are obtained.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134192888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}