2013 IEEE Conference on Computer Vision and Pattern Recognition最新文献

筛选
英文 中文
Manhattan Scene Understanding via XSlit Imaging 通过XSlit成像了解曼哈顿场景
2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.18
Jinwei Ye, Yu Ji, Jingyi Yu
{"title":"Manhattan Scene Understanding via XSlit Imaging","authors":"Jinwei Ye, Yu Ji, Jingyi Yu","doi":"10.1109/CVPR.2013.18","DOIUrl":"https://doi.org/10.1109/CVPR.2013.18","url":null,"abstract":"A Manhattan World (MW) [3] is composed of planar surfaces and parallel lines aligned with three mutually orthogonal principal axes. Traditional MW understanding algorithms rely on geometry priors such as the vanishing points and reference (ground) planes for grouping coplanar structures. In this paper, we present a novel single-image MW reconstruction algorithm from the perspective of non-pinhole cameras. We show that by acquiring the MW using an XSlit camera, we can instantly resolve co planarity ambiguities. Specifically, we prove that parallel 3D lines map to 2D curves in an XSlit image and they converge at an XSlit Vanishing Point (XVP). In addition, if the lines are coplanar, their curved images will intersect at a second common pixel that we call Coplanar Common Point (CCP). CCP is a unique image feature in XSlit cameras that does not exist in pinholes. We present a comprehensive theory to analyze XVPs and CCPs in a MW scene and study how to recover 3D geometry in a complex MW scene from XVPs and CCPs. Finally, we build a prototype XSlit camera by using two layers of cylindrical lenses. Experimental results on both synthetic and real data show that our new XSlit-camera-based solution provides an effective and reliable solution for MW understanding.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"47 1","pages":"81-88"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90353546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Finding Group Interactions in Social Clutter 在社会混乱中发现群体互动
2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.351
Ruonan Li, Parker Porfilio, Todd E. Zickler
{"title":"Finding Group Interactions in Social Clutter","authors":"Ruonan Li, Parker Porfilio, Todd E. Zickler","doi":"10.1109/CVPR.2013.351","DOIUrl":"https://doi.org/10.1109/CVPR.2013.351","url":null,"abstract":"We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space, the extent of their interaction is localized in time, and when the gallery of exemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pair-wise interactions, and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"74 1","pages":"2722-2729"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86560998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Poselet Key-Framing: A Model for Human Activity Recognition 键框架:人类活动识别的一个模型
2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.342
Michalis Raptis, L. Sigal
{"title":"Poselet Key-Framing: A Model for Human Activity Recognition","authors":"Michalis Raptis, L. Sigal","doi":"10.1109/CVPR.2013.342","DOIUrl":"https://doi.org/10.1109/CVPR.2013.342","url":null,"abstract":"In this paper, we develop a new model for recognizing human actions. An action is modeled as a very sparse sequence of temporally local discriminative key frames - collections of partial key-poses of the actor(s), depicting key states in the action sequence. We cast the learning of key frames in a max-margin discriminative framework, where we treat key frames as latent variables. This allows us to (jointly) learn a set of most discriminative key frames while also learning the local temporal context between them. Key frames are encoded using a spatially-localizable pose let-like representation with HoG and BoW components learned from weak annotations, we rely on structured SVM formulation to align our components and mine for hard negatives to boost localization performance. This results in a model that supports spatio-temporal localization and is insensitive to dropped frames or partial observations. We show classification performance that is competitive with the state of the art on the benchmark UT-Interaction dataset and illustrate that our model outperforms prior methods in an on-line streaming setting.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"76 1","pages":"2650-2657"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86639944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 235
Representing Videos Using Mid-level Discriminative Patches 使用中级判别补丁表示视频
2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.332
Arpit Jain, A. Gupta, Mikel D. Rodriguez, L. Davis
{"title":"Representing Videos Using Mid-level Discriminative Patches","authors":"Arpit Jain, A. Gupta, Mikel D. Rodriguez, L. Davis","doi":"10.1109/CVPR.2013.332","DOIUrl":"https://doi.org/10.1109/CVPR.2013.332","url":null,"abstract":"How should a video be represented? We propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatio-temporal patch in the video. What defines these spatio-temporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification where they demonstrate state-of-the-art performance on UCF50 and Olympics datasets.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"3 1","pages":"2571-2578"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88028625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 155
Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity 基于单目模板的局部线弹性可扩展曲面三维重建
2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.200
Abed C. Malti, R. Hartley, A. Bartoli, Jae-Hak Kim
{"title":"Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity","authors":"Abed C. Malti, R. Hartley, A. Bartoli, Jae-Hak Kim","doi":"10.1109/CVPR.2013.200","DOIUrl":"https://doi.org/10.1109/CVPR.2013.200","url":null,"abstract":"We propose a new approach for template-based extensible surface reconstruction from a single view. We extend the method of isometric surface reconstruction and more recent work on conformal surface reconstruction. Our approach relies on the minimization of a proposed stretching energy formalized with respect to the Poisson ratio parameter of the surface. We derive a patch-based formulation of this stretching energy by assuming local linear elasticity. This formulation unifies geometrical and mechanical constraints in a single energy term. We prevent local scale ambiguities by imposing a set of fixed boundary 3D points. We experimentally prove the sufficiency of this set of boundary points and demonstrate the effectiveness of our approach on different developable and non-developable surfaces with a wide range of extensibility.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"34 1","pages":"1522-1529"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87967267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Discriminative Non-blind Deblurring 判别非盲去模糊
2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.84
Uwe Schmidt, C. Rother, Sebastian Nowozin, Jeremy Jancsary, S. Roth
{"title":"Discriminative Non-blind Deblurring","authors":"Uwe Schmidt, C. Rother, Sebastian Nowozin, Jeremy Jancsary, S. Roth","doi":"10.1109/CVPR.2013.84","DOIUrl":"https://doi.org/10.1109/CVPR.2013.84","url":null,"abstract":"Non-blind deblurring is an integral component of blind approaches for removing image blur due to camera shake. Even though learning-based deblurring methods exist, they have been limited to the generative case and are computationally expensive. To this date, manually-defined models are thus most widely used, though limiting the attained restoration quality. We address this gap by proposing a discriminative approach for non-blind deblurring. One key challenge is that the blur kernel in use at test time is not known in advance. To address this, we analyze existing approaches that use half-quadratic regularization. From this analysis, we derive a discriminative model cascade for image deblurring. Our cascade model consists of a Gaussian CRF at each stage, based on the recently introduced regression tree fields. We train our model by loss minimization and use synthetically generated blur kernels to generate training data. Our experiments show that the proposed approach is efficient and yields state-of-the-art restoration quality on images corrupted with synthetic and real blur.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"43 1","pages":"604-611"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88695509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 122
Learning Class-to-Image Distance with Object Matchings 通过对象匹配学习类到图像的距离
2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.108
Guang-Tong Zhou, Tian Lan, Weilong Yang, Greg Mori
{"title":"Learning Class-to-Image Distance with Object Matchings","authors":"Guang-Tong Zhou, Tian Lan, Weilong Yang, Greg Mori","doi":"10.1109/CVPR.2013.108","DOIUrl":"https://doi.org/10.1109/CVPR.2013.108","url":null,"abstract":"We conduct image classification by learning a class-to-image distance function that matches objects. The set of objects in training images for an image class are treated as a collage. When presented with a test image, the best matching between this collage of training image objects and those in the test image is found. We validate the efficacy of the proposed model on the PASCAL 07 and SUN 09 datasets, showing that our model is effective for object classification and scene classification tasks. State-of-the-art image classification results are obtained, and qualitative results demonstrate that objects can be accurately matched.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"214 1","pages":"795-802"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89021848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
GeoF: Geodesic Forests for Learning Coupled Predictors GeoF:用于学习耦合预测器的测地线森林
2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.16
P. Kontschieder, Pushmeet Kohli, J. Shotton, A. Criminisi
{"title":"GeoF: Geodesic Forests for Learning Coupled Predictors","authors":"P. Kontschieder, Pushmeet Kohli, J. Shotton, A. Criminisi","doi":"10.1109/CVPR.2013.16","DOIUrl":"https://doi.org/10.1109/CVPR.2013.16","url":null,"abstract":"Conventional decision forest based methods for image labelling tasks like object segmentation make predictions for each variable (pixel) independently [3, 5, 8]. This prevents them from enforcing dependencies between variables and translates into locally inconsistent pixel labellings. Random field models, instead, encourage spatial consistency of labels at increased computational expense. This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on. Such correlations are captured via new long-range, soft connectivity features, computed via generalized geodesic distance transforms. Our model can be thought of as a generalization of the successful Semantic Texton Forest, Auto-Context, and Entangled Forest models. A second contribution is to show the connection between the typical Conditional Random Field (CRF) energy and the forest training objective. This analysis yields a new objective for training decision forests that encourages more accurate structured prediction. Our GeoF model is validated quantitatively on the task of semantic image segmentation, on four challenging and very diverse image datasets. GeoF outperforms both state of-the-art forest models and the conventional pair wise CRF.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"61 11 1","pages":"65-72"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83546441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Large Displacement Optical Flow from Nearest Neighbor Fields 来自最近邻场的大位移光流
2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.316
Zhuoyuan Chen, Hailin Jin, Zhe L. Lin, Scott D. Cohen, Ying Wu
{"title":"Large Displacement Optical Flow from Nearest Neighbor Fields","authors":"Zhuoyuan Chen, Hailin Jin, Zhe L. Lin, Scott D. Cohen, Ying Wu","doi":"10.1109/CVPR.2013.316","DOIUrl":"https://doi.org/10.1109/CVPR.2013.316","url":null,"abstract":"We present an optical flow algorithm for large displacement motions. Most existing optical flow methods use the standard coarse-to-fine framework to deal with large displacement motions which has intrinsic limitations. Instead, we formulate the motion estimation problem as a motion segmentation problem. We use approximate nearest neighbor fields to compute an initial motion field and use a robust algorithm to compute a set of similarity transformations as the motion candidates for segmentation. To account for deviations from similarity transformations, we add local deformations in the segmentation process. We also observe that small objects can be better recovered using translations as the motion candidates. We fuse the motion results obtained under similarity transformations and under translations together before a final refinement. Experimental validation shows that our method can successfully handle large displacement motions. Although we particularly focus on large displacement motions in this work, we make no sacrifice in terms of overall performance. In particular, our method ranks at the top of the Middlebury benchmark.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"44 1","pages":"2443-2450"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78886969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 168
Intrinsic Characterization of Dynamic Surfaces 动态曲面的内在表征
2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.37
Tony Tung, T. Matsuyama
{"title":"Intrinsic Characterization of Dynamic Surfaces","authors":"Tony Tung, T. Matsuyama","doi":"10.1109/CVPR.2013.37","DOIUrl":"https://doi.org/10.1109/CVPR.2013.37","url":null,"abstract":"This paper presents a novel approach to characterize deformable surface using intrinsic property dynamics. 3D dynamic surfaces representing humans in motion can be obtained using multiple view stereo reconstruction methods or depth cameras. Nowadays these technologies have become capable to capture surface variations in real-time, and give details such as clothing wrinkles and deformations. Assuming repetitive patterns in the deformations, we propose to model complex surface variations using sets of linear dynamical systems (LDS) where observations across time are given by surface intrinsic properties such as local curvatures. We introduce an approach based on bags of dynamical systems, where each surface feature to be represented in the codebook is modeled by a set of LDS equipped with timing structure. Experiments are performed on datasets of real-world dynamical surfaces and show compelling results for description, classification and segmentation.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"16 1","pages":"233-240"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80104596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信