{"title":"Manhattan Scene Understanding via XSlit Imaging","authors":"Jinwei Ye, Yu Ji, Jingyi Yu","doi":"10.1109/CVPR.2013.18","DOIUrl":"https://doi.org/10.1109/CVPR.2013.18","url":null,"abstract":"A Manhattan World (MW) [3] is composed of planar surfaces and parallel lines aligned with three mutually orthogonal principal axes. Traditional MW understanding algorithms rely on geometry priors such as the vanishing points and reference (ground) planes for grouping coplanar structures. In this paper, we present a novel single-image MW reconstruction algorithm from the perspective of non-pinhole cameras. We show that by acquiring the MW using an XSlit camera, we can instantly resolve co planarity ambiguities. Specifically, we prove that parallel 3D lines map to 2D curves in an XSlit image and they converge at an XSlit Vanishing Point (XVP). In addition, if the lines are coplanar, their curved images will intersect at a second common pixel that we call Coplanar Common Point (CCP). CCP is a unique image feature in XSlit cameras that does not exist in pinholes. We present a comprehensive theory to analyze XVPs and CCPs in a MW scene and study how to recover 3D geometry in a complex MW scene from XVPs and CCPs. Finally, we build a prototype XSlit camera by using two layers of cylindrical lenses. Experimental results on both synthetic and real data show that our new XSlit-camera-based solution provides an effective and reliable solution for MW understanding.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"47 1","pages":"81-88"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90353546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding Group Interactions in Social Clutter","authors":"Ruonan Li, Parker Porfilio, Todd E. Zickler","doi":"10.1109/CVPR.2013.351","DOIUrl":"https://doi.org/10.1109/CVPR.2013.351","url":null,"abstract":"We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space, the extent of their interaction is localized in time, and when the gallery of exemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pair-wise interactions, and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"74 1","pages":"2722-2729"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86560998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Poselet Key-Framing: A Model for Human Activity Recognition","authors":"Michalis Raptis, L. Sigal","doi":"10.1109/CVPR.2013.342","DOIUrl":"https://doi.org/10.1109/CVPR.2013.342","url":null,"abstract":"In this paper, we develop a new model for recognizing human actions. An action is modeled as a very sparse sequence of temporally local discriminative key frames - collections of partial key-poses of the actor(s), depicting key states in the action sequence. We cast the learning of key frames in a max-margin discriminative framework, where we treat key frames as latent variables. This allows us to (jointly) learn a set of most discriminative key frames while also learning the local temporal context between them. Key frames are encoded using a spatially-localizable pose let-like representation with HoG and BoW components learned from weak annotations, we rely on structured SVM formulation to align our components and mine for hard negatives to boost localization performance. This results in a model that supports spatio-temporal localization and is insensitive to dropped frames or partial observations. We show classification performance that is competitive with the state of the art on the benchmark UT-Interaction dataset and illustrate that our model outperforms prior methods in an on-line streaming setting.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"76 1","pages":"2650-2657"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86639944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arpit Jain, A. Gupta, Mikel D. Rodriguez, L. Davis
{"title":"Representing Videos Using Mid-level Discriminative Patches","authors":"Arpit Jain, A. Gupta, Mikel D. Rodriguez, L. Davis","doi":"10.1109/CVPR.2013.332","DOIUrl":"https://doi.org/10.1109/CVPR.2013.332","url":null,"abstract":"How should a video be represented? We propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatio-temporal patch in the video. What defines these spatio-temporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification where they demonstrate state-of-the-art performance on UCF50 and Olympics datasets.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"3 1","pages":"2571-2578"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88028625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abed C. Malti, R. Hartley, A. Bartoli, Jae-Hak Kim
{"title":"Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity","authors":"Abed C. Malti, R. Hartley, A. Bartoli, Jae-Hak Kim","doi":"10.1109/CVPR.2013.200","DOIUrl":"https://doi.org/10.1109/CVPR.2013.200","url":null,"abstract":"We propose a new approach for template-based extensible surface reconstruction from a single view. We extend the method of isometric surface reconstruction and more recent work on conformal surface reconstruction. Our approach relies on the minimization of a proposed stretching energy formalized with respect to the Poisson ratio parameter of the surface. We derive a patch-based formulation of this stretching energy by assuming local linear elasticity. This formulation unifies geometrical and mechanical constraints in a single energy term. We prevent local scale ambiguities by imposing a set of fixed boundary 3D points. We experimentally prove the sufficiency of this set of boundary points and demonstrate the effectiveness of our approach on different developable and non-developable surfaces with a wide range of extensibility.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"34 1","pages":"1522-1529"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87967267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Uwe Schmidt, C. Rother, Sebastian Nowozin, Jeremy Jancsary, S. Roth
{"title":"Discriminative Non-blind Deblurring","authors":"Uwe Schmidt, C. Rother, Sebastian Nowozin, Jeremy Jancsary, S. Roth","doi":"10.1109/CVPR.2013.84","DOIUrl":"https://doi.org/10.1109/CVPR.2013.84","url":null,"abstract":"Non-blind deblurring is an integral component of blind approaches for removing image blur due to camera shake. Even though learning-based deblurring methods exist, they have been limited to the generative case and are computationally expensive. To this date, manually-defined models are thus most widely used, though limiting the attained restoration quality. We address this gap by proposing a discriminative approach for non-blind deblurring. One key challenge is that the blur kernel in use at test time is not known in advance. To address this, we analyze existing approaches that use half-quadratic regularization. From this analysis, we derive a discriminative model cascade for image deblurring. Our cascade model consists of a Gaussian CRF at each stage, based on the recently introduced regression tree fields. We train our model by loss minimization and use synthetically generated blur kernels to generate training data. Our experiments show that the proposed approach is efficient and yields state-of-the-art restoration quality on images corrupted with synthetic and real blur.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"43 1","pages":"604-611"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88695509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guang-Tong Zhou, Tian Lan, Weilong Yang, Greg Mori
{"title":"Learning Class-to-Image Distance with Object Matchings","authors":"Guang-Tong Zhou, Tian Lan, Weilong Yang, Greg Mori","doi":"10.1109/CVPR.2013.108","DOIUrl":"https://doi.org/10.1109/CVPR.2013.108","url":null,"abstract":"We conduct image classification by learning a class-to-image distance function that matches objects. The set of objects in training images for an image class are treated as a collage. When presented with a test image, the best matching between this collage of training image objects and those in the test image is found. We validate the efficacy of the proposed model on the PASCAL 07 and SUN 09 datasets, showing that our model is effective for object classification and scene classification tasks. State-of-the-art image classification results are obtained, and qualitative results demonstrate that objects can be accurately matched.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"214 1","pages":"795-802"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89021848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Kontschieder, Pushmeet Kohli, J. Shotton, A. Criminisi
{"title":"GeoF: Geodesic Forests for Learning Coupled Predictors","authors":"P. Kontschieder, Pushmeet Kohli, J. Shotton, A. Criminisi","doi":"10.1109/CVPR.2013.16","DOIUrl":"https://doi.org/10.1109/CVPR.2013.16","url":null,"abstract":"Conventional decision forest based methods for image labelling tasks like object segmentation make predictions for each variable (pixel) independently [3, 5, 8]. This prevents them from enforcing dependencies between variables and translates into locally inconsistent pixel labellings. Random field models, instead, encourage spatial consistency of labels at increased computational expense. This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on. Such correlations are captured via new long-range, soft connectivity features, computed via generalized geodesic distance transforms. Our model can be thought of as a generalization of the successful Semantic Texton Forest, Auto-Context, and Entangled Forest models. A second contribution is to show the connection between the typical Conditional Random Field (CRF) energy and the forest training objective. This analysis yields a new objective for training decision forests that encourages more accurate structured prediction. Our GeoF model is validated quantitatively on the task of semantic image segmentation, on four challenging and very diverse image datasets. GeoF outperforms both state of-the-art forest models and the conventional pair wise CRF.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"61 11 1","pages":"65-72"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83546441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuoyuan Chen, Hailin Jin, Zhe L. Lin, Scott D. Cohen, Ying Wu
{"title":"Large Displacement Optical Flow from Nearest Neighbor Fields","authors":"Zhuoyuan Chen, Hailin Jin, Zhe L. Lin, Scott D. Cohen, Ying Wu","doi":"10.1109/CVPR.2013.316","DOIUrl":"https://doi.org/10.1109/CVPR.2013.316","url":null,"abstract":"We present an optical flow algorithm for large displacement motions. Most existing optical flow methods use the standard coarse-to-fine framework to deal with large displacement motions which has intrinsic limitations. Instead, we formulate the motion estimation problem as a motion segmentation problem. We use approximate nearest neighbor fields to compute an initial motion field and use a robust algorithm to compute a set of similarity transformations as the motion candidates for segmentation. To account for deviations from similarity transformations, we add local deformations in the segmentation process. We also observe that small objects can be better recovered using translations as the motion candidates. We fuse the motion results obtained under similarity transformations and under translations together before a final refinement. Experimental validation shows that our method can successfully handle large displacement motions. Although we particularly focus on large displacement motions in this work, we make no sacrifice in terms of overall performance. In particular, our method ranks at the top of the Middlebury benchmark.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"44 1","pages":"2443-2450"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78886969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intrinsic Characterization of Dynamic Surfaces","authors":"Tony Tung, T. Matsuyama","doi":"10.1109/CVPR.2013.37","DOIUrl":"https://doi.org/10.1109/CVPR.2013.37","url":null,"abstract":"This paper presents a novel approach to characterize deformable surface using intrinsic property dynamics. 3D dynamic surfaces representing humans in motion can be obtained using multiple view stereo reconstruction methods or depth cameras. Nowadays these technologies have become capable to capture surface variations in real-time, and give details such as clothing wrinkles and deformations. Assuming repetitive patterns in the deformations, we propose to model complex surface variations using sets of linear dynamical systems (LDS) where observations across time are given by surface intrinsic properties such as local curvatures. We introduce an approach based on bags of dynamical systems, where each surface feature to be represented in the codebook is modeled by a set of LDS equipped with timing structure. Experiments are performed on datasets of real-world dynamical surfaces and show compelling results for description, classification and segmentation.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"16 1","pages":"233-240"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80104596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}