{"title":"Single-Shot Time-of-Flight Phase Unwrapping Using Two Modulation Frequencies","authors":"Changpeng Ti, Ruigang Yang, James Davis","doi":"10.1109/3DV.2016.74","DOIUrl":"https://doi.org/10.1109/3DV.2016.74","url":null,"abstract":"We present a novel phase unwrapping framework for the Time-of-Flight sensor that can match the performance of systems using two modulation frequencies, within a single shot. Our framework is based on an interleaved pixel arrangement, where a pixel measures phase at a different modulation frequency from its neighboring pixels. We demonstrate that: (1) it is practical to capture ToF images that contain phases from two frequencies in a single shot, with no loss in signal fidelity, (2) phase unwrapping can be effectively performed on such an interleaved phase image, and (3) our method preserves the original spatial resolution. We find that the output of our framework is comparable to results using two shots under separate modulation frequencies, and is significantly better than using a single modulation frequency.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124750416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Depth Restoration Occlusionless Temporal Dataset","authors":"Daniel Rotman, Guy Gilboa","doi":"10.1109/3DV.2016.26","DOIUrl":"https://doi.org/10.1109/3DV.2016.26","url":null,"abstract":"Depth restoration, the task of correcting depth noise and artifacts, has recently risen in popularity due to the increase in commodity depth cameras. When assessing the quality of existing methods, most researchers resort to the popular Middlebury dataset, however, this dataset was not created for depth enhancement, and therefore lacks the option of comparing genuine low-quality depth images with their high-quality, ground-truth counterparts. To address this shortcoming, we present the Depth Restoration Occlusionless Temporal (DROT) dataset. This dataset offers real depth sensor input coupled with registered pixel-to-pixel color images, and the ground-truth depth to which we wish to compare. Our dataset includes not only Kinect 1 and Kinect 2 data, but also an Intel R200 sensor intended for integration into hand-held devices. Beyond this, we present a new temporal depth-restoration method. Utilizing multiple frames, we create a number of possibilities for an initial degraded depth map, which allows us to arrive at a more educated decision when refining depth images. Evaluating this method with our dataset shows significant benefits, particularly for overcoming real sensor-noise artifacts.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122063429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Busam, M. Esposito, B. Frisch, Nassir Navab
{"title":"Quaternionic Upsampling: Hyperspherical Techniques for 6 DoF Pose Tracking","authors":"Benjamin Busam, M. Esposito, B. Frisch, Nassir Navab","doi":"10.1109/3DV.2016.71","DOIUrl":"https://doi.org/10.1109/3DV.2016.71","url":null,"abstract":"Fast real-time tracking is an integral component of modern 3D computer vision pipelines. Despite their advantages in accuracy and reliability, optical trackers suffer from limited acquisition rates depending either on intrinsic sensor capabilities or physical limitations such as exposure time. Moreover, data transmission and image processing produce latency in the pose stream. We introduce quaternionic upsampling to overcome these problems. The technique models the pose parameters as points on multidimensional hyperspheres in (dual) quaternion space. In order to upsample the pose stream, we present several methods to sample points on geodesics and piecewise continuous curves on these manifolds and compare them regarding accuracy and computation efficiency. With the unified approach of quaternionic upsampling, both interpolation and extrapolation in pose space can be done by continuous linear variation of only one sampling parameter. Since the method can be implemented rather efficiently, pose rates of over 4 kHz and future pose predictions with an accuracy of 128 μm and 0.5° are possible in real-time. The method does not depend on a special tracking algorithm and can thus be used for any arbitrary 3 DoF or 6 DoF rotation or pose tracking system.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122205990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Plane-Based Calibration of Multiple Non-Overlapping Cameras","authors":"Chen Zhu, Zihan Zhou, Ziran Xing, Yanbing Dong, Yi Ma, Jingyi Yu","doi":"10.1109/3DV.2016.73","DOIUrl":"https://doi.org/10.1109/3DV.2016.73","url":null,"abstract":"The availability of commodity multi-camera systems such as Google Jump, Jaunt, and Lytro Immerge have brought new demand for reliable and efficient extrinsic camera calibration. State-of-the-art solutions generally require that adjacent, if not all, cameras observe a common area or employ known scene structures. In this paper, we present a novel multi-camera calibration technique that eliminates such requirements. Our approach extends the single-pair hand-eye calibration used in robotics to multi-camera systems. Specifically, we make use of (possibly unknown) planar structures in the scene and combine plane-based structure from motion, camera pose estimation, and task-specific bundle adjustment for extrinsic calibration. Experiments on several multi-camera setups demonstrate that our scheme is highly accurate, robust, and efficient.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129800075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steven G. McDonagh, M. Klaudiny, D. Bradley, T. Beeler, Iain Matthews, Kenny Mitchell
{"title":"Synthetic Prior Design for Real-Time Face Tracking","authors":"Steven G. McDonagh, M. Klaudiny, D. Bradley, T. Beeler, Iain Matthews, Kenny Mitchell","doi":"10.1109/3DV.2016.72","DOIUrl":"https://doi.org/10.1109/3DV.2016.72","url":null,"abstract":"Real-time facial performance capture has recently been gaining popularity in virtual film production, driven by advances in machine learning, which allows for fast inference of facial geometry from video streams. These learning-based approaches are significantly influenced by the quality and amount of labelled training data. Tedious construction of training sets from real imagery can be replaced by rendering a facial animation rig under on-set conditions expected at runtime. We learn a synthetic actor-specific prior by adapting a state-of-the-art facial tracking method. Synthetic training significantly reduces the capture and annotation burden and in theory allows generation of an arbitrary amount of data. But practical realities such as training time and compute resources still limit the size of any training set. We construct better and smaller training sets by investigating which facial image appearances are crucial for tracking accuracy, covering the dimensions of expression, viewpoint and illumination. A reduction of training data in 1-2 orders of magnitude is demonstrated whilst tracking accuracy is retained for challenging on-set footage.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126399742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Tracking in Low Light and Sudden Illumination Changes","authors":"Hatem Alismail, Brett Browning, S. Lucey","doi":"10.1109/3DV.2016.48","DOIUrl":"https://doi.org/10.1109/3DV.2016.48","url":null,"abstract":"We present an algorithm for robust and real-time visual tracking under challenging illumination conditions characterized by poor lighting as well as sudden and drastic changes in illumination. Robustness is achieved by adapting illumination-invariant binary descriptors to dense image alignment using the Lucas and Kanade algorithm. The proposed adaptation preserves the Hamming distance under least-squares minimization, thus preserving the photometric invariance properties of binary descriptors. Due to the compactness of the descriptor, the algorithm runs in excess of 400 fps on laptops and 100 fps on mobile devices.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nadia Robertini, D. Casas, Helge Rhodin, H. Seidel, C. Theobalt
{"title":"Model-Based Outdoor Performance Capture","authors":"Nadia Robertini, D. Casas, Helge Rhodin, H. Seidel, C. Theobalt","doi":"10.1109/3DV.2016.25","DOIUrl":"https://doi.org/10.1109/3DV.2016.25","url":null,"abstract":"We propose a new model-based method to accurately reconstruct human performances captured outdoors in a multi-camera setup. Starting from a template of the actor model, we introduce a new unified implicit representation for both, articulated skeleton tracking and non-rigid surface shape refinement. Our method fits the template to unsegmented video frames in two stages - first, the coarse skeletal pose is estimated, and subsequently non-rigid surface shape and body pose are jointly refined. Particularly for surface shape refinement we propose a new combination of 3D Gaussians designed to align the projected model with likely silhouette contours without explicit segmentation or edge detection. We obtain reconstructions of much higher quality in outdoor settings than existing methods, and show that we are on par with state-of-the-art methods on indoor scenes for which they were designed.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133183116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Georgakis, Md. Alimoor Reza, Arsalan Mousavian, P. Le, J. Kosecka
{"title":"Multiview RGB-D Dataset for Object Instance Detection","authors":"G. Georgakis, Md. Alimoor Reza, Arsalan Mousavian, P. Le, J. Kosecka","doi":"10.1109/3DV.2016.52","DOIUrl":"https://doi.org/10.1109/3DV.2016.52","url":null,"abstract":"This paper presents a new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset. The viewpoints of the scenes are densely sampled and objects in the scenes are annotated with bounding boxes and in the 3D point cloud. Also, an approach for detection and recognition is presented, which is comprised of two parts: (i) a new multi-view 3D proposal generation method and (ii) the development of several recognition baselines using AlexNet to score our proposals, which is trained either on crops of the dataset or on synthetically composited training images. Finally, we compare the performance of the object proposals and a detection baseline to the Washington RGB-D Scenes (WRGB-D) dataset and demonstrate that our Kitchen scenes dataset is more challenging for object detection and recognition. The dataset is available at: http://cs.gmu.edu/~robot/gmu-kitchens.html.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128180600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrick Poirson, Phil Ammirato, Cheng-Yang Fu, Wei Liu, J. Kosecka, A. Berg
{"title":"Fast Single Shot Detection and Pose Estimation","authors":"Patrick Poirson, Phil Ammirato, Cheng-Yang Fu, Wei Liu, J. Kosecka, A. Berg","doi":"10.1109/3DV.2016.78","DOIUrl":"https://doi.org/10.1109/3DV.2016.78","url":null,"abstract":"For applications in navigation and robotics, estimating the 3D pose of objects is as important as detection. Many approaches to pose estimation rely on detecting or tracking parts or keypoints [11, 21]. In this paper we build on a recent state-of-the-art convolutional network for sliding-window detection [10] to provide detection and rough pose estimation in a single shot, without intermediate stages of detecting parts or initial bounding boxes. While not the first system to treat pose estimation as a categorization problem, this is the first attempt to combine detection and pose estimation at the same level using a deep learning approach. The key to the architecture is a deep convolutional network where scores for the presence of an object category, the offset for its location, and the approximate pose are all estimated on a regular grid of locations in the image. The resulting system is as accurate as recent work on pose estimation (42.4% 8 View mAVP on Pascal 3D+ [21] ) and significantly faster (46 frames per second (FPS) on a TITAN X GPU). This approach to detection and rough pose estimation is fast and accurate enough to be widely applied as a pre-processing step for tasks including high-accuracy pose estimation, object tracking and localization, and vSLAM.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132554540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Bronstein, Yoni Choukroun, R. Kimmel, Matan Sela
{"title":"Consistent Discretization and Minimization of the L1 Norm on Manifolds","authors":"A. Bronstein, Yoni Choukroun, R. Kimmel, Matan Sela","doi":"10.1109/3DV.2016.53","DOIUrl":"https://doi.org/10.1109/3DV.2016.53","url":null,"abstract":"The L1 norm has been tremendously popular in signal and image processing in the past two decades due to its sparsity-promoting properties. More recently, its generalization to non-Euclidean domains has been found useful in shape analysis applications. For example, in conjunction with the minimization of the Dirichlet energy, it was shown to produce a compactly supported quasi-harmonic orthonormal basis, dubbed as compressed manifold modes [14]. The continuous L1 norm on the manifold is often replaced by the vector ℓ1 norm applied to sampled functions. We show that such an approach is incorrect in the sense that it does not consistently discretize the continuous norm and warn against its sensitivity to the specific sampling. We propose two alternative discretizations resulting in an iteratively-reweighed ℓ2 norm. We demonstrate the proposed strategy on the compressed modes problem, which reduces to a sequence of simple eigendecomposition problems not requiring non-convex optimization on Stiefel manifolds and producing more stable and accurate results.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121868572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}