2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

筛选
英文 中文
How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image 能有多难呢?估计图像中视觉搜索的难度
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.237
Radu Tudor Ionescu, B. Alexe, Marius Leordeanu, M. Popescu, Dim P. Papadopoulos, V. Ferrari
{"title":"How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image","authors":"Radu Tudor Ionescu, B. Alexe, Marius Leordeanu, M. Popescu, Dim P. Papadopoulos, V. Ferrari","doi":"10.1109/CVPR.2016.237","DOIUrl":"https://doi.org/10.1109/CVPR.2016.237","url":null,"abstract":"We address the problem of estimating image difficulty defined as the human response time for solving a visual search task. We collect human annotations of image difficulty for the PASCAL VOC 2012 data set through a crowd-sourcing platform. We then analyze what human interpretable image properties can have an impact on visual search difficulty, and how accurate are those properties for predicting difficulty. Next, we build a regression model based on deep features learned with state of the art convolutional neural networks and show better results for predicting the ground-truth visual search difficulty scores produced by human annotators. Our model is able to correctly rank about 75% image pairs according to their difficulty score. We also show that our difficulty predictor generalizes well to new classes not seen during training. Finally, we demonstrate that our predicted difficulty scores are useful for weakly supervised object localization (8% improvement) and semi-supervised object classification (1% improvement).","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"55 1","pages":"2157-2166"},"PeriodicalIF":0.0,"publicationDate":"2016-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86773073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 112
Simultaneous Optical Flow and Intensity Estimation from an Event Camera 事件相机同时光流和光强估计
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.102
Patrick Bardow, A. Davison, Stefan Leutenegger
{"title":"Simultaneous Optical Flow and Intensity Estimation from an Event Camera","authors":"Patrick Bardow, A. Davison, Stefan Leutenegger","doi":"10.1109/CVPR.2016.102","DOIUrl":"https://doi.org/10.1109/CVPR.2016.102","url":null,"abstract":"Event cameras are bio-inspired vision sensors which mimic retinas to measure per-pixel intensity change rather than outputting an actual intensity image. This proposed paradigm shift away from traditional frame cameras offers significant potential advantages: namely avoiding high data rates, dynamic range limitations and motion blur. Unfortunately, however, established computer vision algorithms may not at all be applied directly to event cameras. Methods proposed so far to reconstruct images, estimate optical flow, track a camera and reconstruct a scene come with severe restrictions on the environment or on the motion of the camera, e.g. allowing only rotation. Here, we propose, to the best of our knowledge, the first algorithm to simultaneously recover the motion field and brightness image, while the camera undergoes a generic motion through any scene. Our approach employs minimisation of a cost function that contains the asynchronous event data as well as spatial and temporal regularisation within a sliding window time interval. Our implementation relies on GPU optimisation and runs in near real-time. In a series of examples, we demonstrate the successful operation of our framework, including in situations where conventional cameras suffer from dynamic range limitations and motion blur.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"43 1","pages":"884-892"},"PeriodicalIF":0.0,"publicationDate":"2016-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88286538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 220
Multivariate Regression on the Grassmannian for Predicting Novel Domains 格拉斯曼预测新领域的多元回归
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.548
Yongxin Yang, Timothy M. Hospedales
{"title":"Multivariate Regression on the Grassmannian for Predicting Novel Domains","authors":"Yongxin Yang, Timothy M. Hospedales","doi":"10.1109/CVPR.2016.548","DOIUrl":"https://doi.org/10.1109/CVPR.2016.548","url":null,"abstract":"We study the problem of predicting how to recognise visual objects in novel domains with neither labelled nor unlabelled training data. Domain adaptation is now an established research area due to its value in ameliorating the issue of domain shift between train and test data. However, it is conventionally assumed that domains are discrete entities, and that at least unlabelled data is provided in testing domains. In this paper, we consider the case where domains are parametrised by a vector of continuous values (e.g., time, lighting or view angle). We aim to use such domain metadata to predict novel domains for recognition. This allows a recognition model to be pre-calibrated for a new domain in advance (e.g., future time or view angle) without waiting for data collection and re-training. We achieve this by posing the problem as one of multivariate regression on the Grassmannian, where we regress a domain's subspace (point on the Grassmannian) against an independent vector of domain parameters. We derive two novel methodologies to achieve this challenging task: a direct kernel regression from RM ! G, and an indirect method with better extrapolation properties. We evaluate our methods on two crossdomain visual recognition benchmarks, where they perform close to the upper bound of full data domain adaptation. This demonstrates that data is not necessary for domain adaptation if a domain can be parametrically described.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"6 1","pages":"5071-5080"},"PeriodicalIF":0.0,"publicationDate":"2016-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84776732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Sketch Me That Shoe 给我画那只鞋
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.93
Qian Yu, Feng Liu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales, Chen Change Loy
{"title":"Sketch Me That Shoe","authors":"Qian Yu, Feng Liu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales, Chen Change Loy","doi":"10.1109/CVPR.2016.93","DOIUrl":"https://doi.org/10.1109/CVPR.2016.93","url":null,"abstract":"We investigate the problem of fine-grained sketch-based image retrieval (SBIR), where free-hand human sketches are used as queries to perform instance-level retrieval of images. This is an extremely challenging task because (i) visual comparisons not only need to be fine-grained but also executed cross-domain, (ii) free-hand (finger) sketches are highly abstract, making fine-grained matching harder, and most importantly (iii) annotated cross-domain sketch-photo datasets required for training are scarce, challenging many state-of-the-art machine learning techniques. In this paper, for the first time, we address all these challenges, providing a step towards the capabilities that would underpin a commercial sketch-based image retrieval application. We introduce a new database of 1,432 sketchphoto pairs from two categories with 32,000 fine-grained triplet ranking annotations. We then develop a deep tripletranking model for instance-level SBIR with a novel data augmentation and staged pre-training strategy to alleviate the issue of insufficient fine-grained training data. Extensive experiments are carried out to contribute a variety of insights into the challenges of data sufficiency and over-fitting avoidance when training deep networks for finegrained cross-domain ranking tasks.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"799-807"},"PeriodicalIF":0.0,"publicationDate":"2016-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74513871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 379
Discovering the Physical Parts of an Articulated Object Class from Multiple Videos 从多个视频中发现铰接对象类的物理部分
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.84
Luca Del Pero, Susanna Ricco, R. Sukthankar, V. Ferrari
{"title":"Discovering the Physical Parts of an Articulated Object Class from Multiple Videos","authors":"Luca Del Pero, Susanna Ricco, R. Sukthankar, V. Ferrari","doi":"10.1109/CVPR.2016.84","DOIUrl":"https://doi.org/10.1109/CVPR.2016.84","url":null,"abstract":"We propose a motion-based method to discover the physical parts of an articulated object class (e.g. head/torso/leg of a horse) from multiple videos. The key is to find object regions that exhibit consistent motion relative to the rest of the object, across multiple videos. We can then learn a location model for the parts and segment them accurately in the individual videos using an energy function that also enforces temporal and spatial consistency in part motion. Unlike our approach, traditional methods for motion segmentation or non-rigid structure from motion operate on one video at a time. Hence they cannot discover a part unless it displays independent motion in that particular video. We evaluate our method on a new dataset of 32 videos of tigers and horses, where we significantly outperform a recent motion segmentation method on the task of part discovery (obtaining roughly twice the accuracy).","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"714-723"},"PeriodicalIF":0.0,"publicationDate":"2016-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88124388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Kinematic Structure Correspondences via Hypergraph Matching 基于超图匹配的运动结构对应
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-12-12 DOI: 10.1109/CVPR.2016.457
H. Chang, Tobias Fischer, Maxime Petit, Martina Zambelli, Y. Demiris
{"title":"Kinematic Structure Correspondences via Hypergraph Matching","authors":"H. Chang, Tobias Fischer, Maxime Petit, Martina Zambelli, Y. Demiris","doi":"10.1109/CVPR.2016.457","DOIUrl":"https://doi.org/10.1109/CVPR.2016.457","url":null,"abstract":"In this paper, we present a novel framework for finding the kinematic structure correspondence between two objects in videos via hypergraph matching. In contrast to prior appearance and graph alignment based matching methods which have been applied among two similar static images, the proposed method finds correspondences between two dynamic kinematic structures of heterogeneous objects in videos. Our main contributions can be summarised as follows: (i) casting the kinematic structure correspondence problem into a hypergraph matching problem, incorporating multi-order similarities with normalising weights, (ii) a structural topology similarity measure by a new topology constrained subgraph isomorphism aggregation, (iii) a kinematic correlation measure between pairwise nodes, and (iv) a combinatorial local motion similarity measure using geodesic distance on the Riemannian manifold. We demonstrate the robustness and accuracy of our method through a number of experiments on complex articulated synthetic and real data.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"10 1","pages":"4216-4225"},"PeriodicalIF":0.0,"publicationDate":"2016-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89471364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning iLab-20M:用于研究深度学习的大规模受控对象数据集
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-07-01 DOI: 10.1109/CVPR.2016.244
A. Borji, S. Izadi, L. Itti
{"title":"iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning","authors":"A. Borji, S. Izadi, L. Itti","doi":"10.1109/CVPR.2016.244","DOIUrl":"https://doi.org/10.1109/CVPR.2016.244","url":null,"abstract":"Tolerance to image variations (e.g., translation, scale, pose, illumination, background) is an important desired property of any object recognition system, be it human or machine. Moving towards increasingly bigger datasets has been trending in computer vision especially with the emergence of highly popular deep learning models. While being very useful for learning invariance to object inter-and intra-class shape variability, these large-scale wild datasets are not very useful for learning invariance to other parameters urging researchers to resort to other tricks for training models. In this work, we introduce a large-scale synthetic dataset, which is freely and publicly available, and use it to answer several fundamental questions regarding selectivity and invariance properties of convolutional neural networks. Our dataset contains two parts: a) objects shot on a turntable: 15 categories, 8 rotation angles, 11 cameras on a semi-circular arch, 5 lighting conditions, 3 focus levels, variety of backgrounds (23.4 per instance) generating 1320 images per instance (about 22 million images in total), and b) scenes: in which a robotic arm takes pictures of objects on a 1:160 scale scene. We study: 1) invariance and selectivity of different CNN layers, 2) knowledge transfer from one object category to another, 3) systematic or random sampling of images to build a train set, 4) domain adaptation from synthetic to natural scenes, and 5) order of knowledge delivery to CNNs. We also discuss how our analyses can lead the field to develop more efficient deep learning methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"24 1","pages":"2221-2230"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84731860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation 可变形零件混合的端到端学习与深度卷积神经网络人体姿态估计
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.335
Wei Yang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang
{"title":"End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation","authors":"Wei Yang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang","doi":"10.1109/CVPR.2016.335","DOIUrl":"https://doi.org/10.1109/CVPR.2016.335","url":null,"abstract":"Recently, Deep Convolutional Neural Networks (DCNNs) have been applied to the task of human pose estimation, and have shown its potential of learning better feature representations and capturing contextual relationships. However, it is difficult to incorporate domain prior knowledge such as geometric relationships among body parts into DCNNs. In addition, training DCNN-based body part detectors without consideration of global body joint consistency introduces ambiguities, which increases the complexity of training. In this paper, we propose a novel end-to-end framework for human pose estimation that combines DCNNs with the expressive deformable mixture of parts. We explicitly incorporate domain prior knowledge into the framework, which greatly regularizes the learning process and enables the flexibility of our framework for loopy models or tree-structured models. The effectiveness of jointly learning a DCNN with a deformable mixture of parts model is evaluated through intensive experiments on several widely used benchmarks. The proposed approach significantly improves the performance compared with state-of-the-art approaches, especially on benchmarks with challenging articulations.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"75 1","pages":"3073-3082"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75702107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 233
Visual Path Prediction in Complex Scenes with Crowded Moving Objects 具有拥挤运动物体的复杂场景的视觉路径预测
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.292
Y. Yoo, Kimin Yun, Sangdoo Yun, Jonghee Hong, Hawook Jeong, J. Choi
{"title":"Visual Path Prediction in Complex Scenes with Crowded Moving Objects","authors":"Y. Yoo, Kimin Yun, Sangdoo Yun, Jonghee Hong, Hawook Jeong, J. Choi","doi":"10.1109/CVPR.2016.292","DOIUrl":"https://doi.org/10.1109/CVPR.2016.292","url":null,"abstract":"This paper proposes a novel path prediction algorithm for progressing one step further than the existing works focusing on single target path prediction. In this paper, we consider moving dynamics of co-occurring objects for path prediction in a scene that includes crowded moving objects. To solve this problem, we first suggest a two-layered probabilistic model to find major movement patterns and their cooccurrence tendency. By utilizing the unsupervised learning results from the model, we present an algorithm to find the future location of any target object. Through extensive qualitative/quantitative experiments, we show that our algorithm can find a plausible future path in complex scenes with a large number of moving objects.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"88 1","pages":"2668-2677"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78113497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
HD Maps: Fine-Grained Road Segmentation by Parsing Ground and Aerial Images 高清地图:通过解析地面和航空图像进行细粒度道路分割
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.393
G. Máttyus, Shenlong Wang, S. Fidler, R. Urtasun
{"title":"HD Maps: Fine-Grained Road Segmentation by Parsing Ground and Aerial Images","authors":"G. Máttyus, Shenlong Wang, S. Fidler, R. Urtasun","doi":"10.1109/CVPR.2016.393","DOIUrl":"https://doi.org/10.1109/CVPR.2016.393","url":null,"abstract":"In this paper we present an approach to enhance existing maps with fine grained segmentation categories such as parking spots and sidewalk, as well as the number and location of road lanes. Towards this goal, we propose an efficient approach that is able to estimate these fine grained categories by doing joint inference over both, monocular aerial imagery, as well as ground images taken from a stereo camera pair mounted on top of a car. Important to this is reasoning about the alignment between the two types of imagery, as even when the measurements are taken with sophisticated GPS+IMU systems, this alignment is not sufficiently accurate. We demonstrate the effectiveness of our approach on a new dataset which enhances KITTI [8] with aerial images taken with a camera mounted on an airplane and flying around the city of Karlsruhe, Germany.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"21 1","pages":"3611-3619"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79772098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 131
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信