2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献_第10页

Cataloging Public Objects Using Aerial and Street-Level Images — Urban Trees 使用空中和街道级图像编目公共对象-城市树木

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.647

J. D. Wegner, Steve Branson, David Hall, K. Schindler, P. Perona

{"title":"Cataloging Public Objects Using Aerial and Street-Level Images — Urban Trees","authors":"J. D. Wegner, Steve Branson, David Hall, K. Schindler, P. Perona","doi":"10.1109/CVPR.2016.647","DOIUrl":"https://doi.org/10.1109/CVPR.2016.647","url":null,"abstract":"Each corner of the inhabited world is imaged from multiple viewpoints with increasing frequency. Online map services like Google Maps or Here Maps provide direct access to huge amounts of densely sampled, georeferenced images from street view and aerial perspective. There is an opportunity to design computer vision systems that will help us search, catalog and monitor public infrastructure, buildings and artifacts. We explore the architecture and feasibility of such a system. The main technical challenge is combining test time information from multiple views of each geographic location (e.g., aerial and street views). We implement two modules: det2geo, which detects the set of locations of objects belonging to a given category, and geo2cat, which computes the fine-grained category of the object at a given location. We introduce a solution that adapts state-of the-art CNN-based object detectors and classifiers. We test our method on \"Pasadena Urban Trees\", a new dataset of 80,000 trees with geographic and species annotations, and show that combining multiple views significantly improves both tree detection and tree species classification, rivaling human performance.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"44 1","pages":"6014-6023"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82164718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 148

Interactive Segmentation on RGBD Images via Cue Selection 基于线索选择的RGBD图像交互式分割

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.24

Jie Feng, Brian L. Price, Scott D. Cohen, Shih-Fu Chang

{"title":"Interactive Segmentation on RGBD Images via Cue Selection","authors":"Jie Feng, Brian L. Price, Scott D. Cohen, Shih-Fu Chang","doi":"10.1109/CVPR.2016.24","DOIUrl":"https://doi.org/10.1109/CVPR.2016.24","url":null,"abstract":"Interactive image segmentation is an important problem in computer vision with many applications including image editing, object recognition and image retrieval. Most existing interactive segmentation methods only operate on color images. Until recently, very few works have been proposed to leverage depth information from low-cost sensors to improve interactive segmentation. While these methods achieve better results than color-based methods, they are still limited in either using depth as an additional color channel or simply combining depth with color in a linear way. We propose a novel interactive segmentation algorithm which can incorporate multiple feature cues like color, depth, and normals in an unified graph cut framework to leverage these cues more effectively. A key contribution of our method is that it automatically selects a single cue to be used at each pixel, based on the intuition that only one cue is necessary to determine the segmentation label locally. This is achieved by optimizing over both segmentation labels and cue labels, using terms designed to decide where both the segmentation and label cues should change. Our algorithm thus produces not only the segmentation mask but also a cue label map that indicates where each cue contributes to the final result. Extensive experiments on five large scale RGBD datasets show that our proposed algorithm performs significantly better than both other color-based and RGBD based algorithms in reducing the amount of user inputs as well as increasing segmentation accuracy.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"46 1","pages":"156-164"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85763239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Harnessing Object and Scene Semantics for Large-Scale Video Understanding 利用对象和场景语义进行大规模视频理解

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.339

Zuxuan Wu, Yanwei Fu, Yu-Gang Jiang, L. Sigal

{"title":"Harnessing Object and Scene Semantics for Large-Scale Video Understanding","authors":"Zuxuan Wu, Yanwei Fu, Yu-Gang Jiang, L. Sigal","doi":"10.1109/CVPR.2016.339","DOIUrl":"https://doi.org/10.1109/CVPR.2016.339","url":null,"abstract":"Large-scale action recognition and video categorization are important problems in computer vision. To address these problems, we propose a novel object-and scene-based semantic fusion network and representation. Our semantic fusion network combines three streams of information using a three-layer neural network: (i) frame-based low-level CNN features, (ii) object features from a state-of-the-art large-scale CNN object-detector trained to recognize 20K classes, and (iii) scene features from a state-of-the-art CNN scene-detector trained to recognize 205 scenes. The trained network achieves improvements in supervised activity and video categorization in two complex large-scale datasets - ActivityNet and FCVID, respectively. Further, by examining and back propagating information through the fusion network, semantic relationships (correlations) between video classes and objects/scenes can be discovered. These video class-object/video class-scene relationships can in turn be used as semantic representation for the video classes themselves. We illustrate effectiveness of this semantic representation through experiments on zero-shot action/video classification and clustering.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"7 1","pages":"3112-3121"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80779571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 87

Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling 大规模语义三维重建:多类体积标记的自适应多分辨率模型

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.346

M. Blaha, Christoph Vogel, Audrey Richard, J. D. Wegner, T. Pock, K. Schindler

{"title":"Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling","authors":"M. Blaha, Christoph Vogel, Audrey Richard, J. D. Wegner, T. Pock, K. Schindler","doi":"10.1109/CVPR.2016.346","DOIUrl":"https://doi.org/10.1109/CVPR.2016.346","url":null,"abstract":"We propose an adaptive multi-resolution formulation of semantic 3D reconstruction. Given a set of images of a scene, semantic 3D reconstruction aims to densely reconstruct both the 3D shape of the scene and a segmentation into semantic object classes. Jointly reasoning about shape and class allows one to take into account class-specific shape priors (e.g., building walls should be smooth and vertical, and vice versa smooth, vertical surfaces are likely to be building walls), leading to improved reconstruction results. So far, semantic 3D reconstruction methods have been limited to small scenes and low resolution, because of their large memory footprint and computational cost. To scale them up to large scenes, we propose a hierarchical scheme which refines the reconstruction only in regions that are likely to contain a surface, exploiting the fact that both high spatial resolution and high numerical precision are only required in those regions. Our scheme amounts to solving a sequence of convex optimizations while progressively removing constraints, in such a way that the energy, in each iteration, is the tightest possible approximation of the underlying energy at full resolution. In our experiments the method saves up to 98% memory and 95% computation time, without any loss of accuracy.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"27 1","pages":"3176-3184"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78492971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 94

Automatic Image Cropping: A Computational Complexity Study 自动图像裁剪:计算复杂度研究

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.61

Jiansheng Chen, Gaocheng Bai, Shaoheng Liang, Zhengqin Li

引用次数: 93

Occlusion-Free Face Alignment: Deep Regression Networks Coupled with De-Corrupt AutoEncoders 无遮挡的人脸对齐:深度回归网络与去腐败自编码器相结合

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.373

Jie Zhang, Meina Kan, S. Shan, Xilin Chen

{"title":"Occlusion-Free Face Alignment: Deep Regression Networks Coupled with De-Corrupt AutoEncoders","authors":"Jie Zhang, Meina Kan, S. Shan, Xilin Chen","doi":"10.1109/CVPR.2016.373","DOIUrl":"https://doi.org/10.1109/CVPR.2016.373","url":null,"abstract":"Face alignment or facial landmark detection plays an important role in many computer vision applications, e.g., face recognition, facial expression recognition, face animation, etc. However, the performance of face alignment system degenerates severely when occlusions occur. In this work, we propose a novel face alignment method, which cascades several Deep Regression networks coupled with De-corrupt Autoencoders (denoted as DRDA) to explicitly handle partial occlusion problem. Different from the previous works that can only detect occlusions and discard the occluded parts, our proposed de-corrupt autoencoder network can automatically recover the genuine appearance for the occluded parts and the recovered parts can be leveraged together with those non-occluded parts for more accurate alignment. By coupling de-corrupt autoencoders with deep regression networks, a deep alignment model robust to partial occlusions is achieved. Besides, our method can localize occluded regions rather than merely predict whether the landmarks are occluded. Experiments on two challenging occluded face datasets demonstrate that our method significantly outperforms the state-of-the-art methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"3428-3437"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88665240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 95

Monocular 3D Object Detection for Autonomous Driving 用于自动驾驶的单目3D目标检测

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.236

Xiaozhi Chen, Kaustav Kundu, Ziyu Zhang, Huimin Ma, S. Fidler, R. Urtasun

引用次数: 795

BoxCars: 3D Boxes as CNN Input for Improved Fine-Grained Vehicle Recognition BoxCars: 3D盒子作为CNN输入，用于改进细粒度车辆识别

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.328

Jakub Sochor, A. Herout, Jirí Havel

引用次数: 161

A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification 基于卷积特征学习和级联分类的自然场景文本检测系统

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.74

Siyu Zhu, R. Zanibbi

{"title":"A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification","authors":"Siyu Zhu, R. Zanibbi","doi":"10.1109/CVPR.2016.74","DOIUrl":"https://doi.org/10.1109/CVPR.2016.74","url":null,"abstract":"We propose a system that finds text in natural scenes using a variety of cues. Our novel data-driven method incorporates coarse-to-fine detection of character pixels using convolutional features (Text-Conv), followed by extracting connected components (CCs) from characters using edge and color features, and finally performing a graph-based segmentation of CCs into words (Word-Graph). For Text-Conv, the initial detection is based on convolutional feature maps similar to those used in Convolutional Neural Networks (CNNs), but learned using Convolutional k-means. Convolution masks defined by local and neighboring patch features are used to improve detection accuracy. The Word-Graph algorithm uses contextual information to both improve word segmentation and prune false character/word detections. Different definitions for foreground (text) regions are used to train the detection stages, some based on bounding box intersection, and others on bounding box and pixel intersection. Our system obtains pixel, character, and word detection f-measures of 93.14%, 90.26%, and 86.77% respectively for the ICDAR 2015 Robust Reading Focused Scene Text dataset, out-performing state-of-the-art systems. This approach may work for other detection targets with homogenous color in natural scenes.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"110 1","pages":"625-632"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86740293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 64

DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes 延迟:混沌室内场景的鲁棒空间布局估计

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.73

Saumitro Dasgupta, Kuan Fang, Kevin Chen, S. Savarese

引用次数: 116