2017 IEEE International Conference on Computer Vision (ICCV)最新文献

Active Learning for Human Pose Estimation 基于主动学习的人体姿态估计

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.468

Buyu Liu, V. Ferrari

{"title":"Active Learning for Human Pose Estimation","authors":"Buyu Liu, V. Ferrari","doi":"10.1109/ICCV.2017.468","DOIUrl":"https://doi.org/10.1109/ICCV.2017.468","url":null,"abstract":"Annotating human poses in realistic scenes is very time consuming, yet necessary for training human pose estimators. We propose to address this problem in an active learning framework, which alternates between requesting the most useful annotations among a large set of unlabelled images, and re-training the pose estimator. To this end, (1) we propose an uncertainty estimator specific for body joint predictions, which takes into account the spatial distribution of the responses of the current pose estimator on the unlabelled images; (2) we propose a dynamic combination of influence and uncertainty cues, where their weights vary during the active learning process according to the reliability of the current pose estimator; (3) we introduce a computer assisted annotation interface, which reduces the time necessary for a human annotator to click on a joint by discretizing the image into regions generated by the current pose estimator. Experiments using the MPII and LSP datasets with both simulated and real annotators show that (1) the proposed active selection scheme outperforms several baselines; (2) our computer-assisted interface can further reduce annotation effort; and (3) our technique can further improve the performance of a pose estimator even when starting from an already strong one.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"28 1","pages":"4373-4382"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84362176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 69

Taking the Scenic Route to 3D: Optimising Reconstruction from Moving Cameras 走风景路线到3D:从移动摄像机优化重建

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.501

Oscar Alejandro Mendez Maldonado, Simon Hadfield, N. Pugeault, R. Bowden

{"title":"Taking the Scenic Route to 3D: Optimising Reconstruction from Moving Cameras","authors":"Oscar Alejandro Mendez Maldonado, Simon Hadfield, N. Pugeault, R. Bowden","doi":"10.1109/ICCV.2017.501","DOIUrl":"https://doi.org/10.1109/ICCV.2017.501","url":null,"abstract":"Reconstruction of 3D environments is a problem that has been widely addressed in the literature. While many approaches exist to perform reconstruction, few of them take an active role in deciding where the next observations should come from. Furthermore, the problem of travelling from the camera's current position to the next, known as pathplanning, usually focuses on minimising path length. This approach is ill-suited for reconstruction applications, where learning about the environment is more valuable than speed of traversal. We present a novel Scenic Route Planner that selects paths which maximise information gain, both in terms of total map coverage and reconstruction accuracy. We also introduce a new type of collaborative behaviour into the planning stage called opportunistic collaboration, which allows sensors to switch between acting as independent Structure from Motion (SfM) agents or as a variable baseline stereo pair. We show that Scenic Planning enables similar performance to state-of-the-art batch approaches using less than 0.00027% of the possible stereo pairs (3% of the views). Comparison against length-based pathplanning approaches show that our approach produces more complete and more accurate maps with fewer frames. Finally, we demonstrate the Scenic Pathplanner's ability to generalise to live scenarios by mounting cameras on autonomous ground-based sensor platforms and exploring an environment.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"4 1","pages":"4687-4695"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87564983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Attribute-Enhanced Face Recognition with Neural Tensor Fusion Networks 基于神经张量融合网络的属性增强人脸识别

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.404

Guosheng Hu, Yang Hua, Yang Yuan, Zhihong Zhang, Zheng Lu, S. Mukherjee, Timothy M. Hospedales, N. Robertson, Yongxin Yang

{"title":"Attribute-Enhanced Face Recognition with Neural Tensor Fusion Networks","authors":"Guosheng Hu, Yang Hua, Yang Yuan, Zhihong Zhang, Zheng Lu, S. Mukherjee, Timothy M. Hospedales, N. Robertson, Yongxin Yang","doi":"10.1109/ICCV.2017.404","DOIUrl":"https://doi.org/10.1109/ICCV.2017.404","url":null,"abstract":"Deep learning has achieved great success in face recognition, however deep-learned features still have limited invariance to strong intra-personal variations such as large pose changes. It is observed that some facial attributes (e.g. eyebrow thickness, gender) are robust to such variations. We present the first work to systematically explore how the fusion of face recognition features (FRF) and facial attribute features (FAF) can enhance face recognition performance in various challenging scenarios. Despite the promise of FAF, we find that in practice existing fusion methods fail to leverage FAF to boost face recognition performance in some challenging scenarios. Thus, we develop a powerful tensor-based framework which formulates feature fusion as a tensor optimisation problem. It is nontrivial to directly optimise this tensor due to the large number of parameters to optimise. To solve this problem, we establish a theoretical equivalence between low-rank tensor optimisation and a two-stream gated neural network. This equivalence allows tractable learning using standard neural network optimisation tools, leading to accurate and stable optimisation. Experimental results show the fused feature works better than individual features, thus proving for the first time that facial attributes aid face recognition. We achieve state-of-the-art performance on three popular databases: MultiPIE (cross pose, lighting and expression), CASIA NIR-VIS2.0 (cross-modality environment) and LFW (uncontrolled environment).","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"31 1","pages":"3764-3773"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85155509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Rolling Shutter Correction in Manhattan World 曼哈顿世界的卷帘门修正

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.101

Pulak Purkait, C. Zach, A. Leonardis

引用次数: 37

Visual Odometry for Pixel Processor Arrays 像素处理器阵列的视觉里程计

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.493

Laurie Bose, Jianing Chen, S. Carey, P. Dudek, W. Mayol-Cuevas

引用次数: 27

Joint Learning of Object and Action Detectors 对象和动作检测器的联合学习

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.219

Vicky S. Kalogeiton, Philippe Weinzaepfel, V. Ferrari, C. Schmid

引用次数: 63

Sketching with Style: Visual Search with Sketches and Aesthetic Context 与风格素描:视觉搜索与草图和美学背景

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.290

J. Collomosse, Tu Bui, Michael J. Wilber, Chen Fang, Hailin Jin

引用次数: 54

Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval 基于细粒度草图的图像检索的深度空间语义关注

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.592

Jifei Song, Qian Yu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales

{"title":"Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval","authors":"Jifei Song, Qian Yu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales","doi":"10.1109/ICCV.2017.592","DOIUrl":"https://doi.org/10.1109/ICCV.2017.592","url":null,"abstract":"Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approaches focus only on coarse holistic matching via deep cross-domain representation learning, yet ignore explicitly accounting for fine-grained details and their spatial context. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details: (2) It combines coarse and fine semantic information via a shortcut connection fusion block: and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher-order learnable energy function (HOLEF) based loss. Extensive experiments show that the proposed deep spatial-semantic attention model significantly outperforms the state-of-the-art.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"35 1","pages":"5552-5561"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89354597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 196

Corner-Based Geometric Calibration of Multi-focus Plenoptic Cameras 基于角点的多焦全光相机几何定标

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.109

Sotiris Nousias, F. Chadebecq, Jonas Pichat, P. Keane, S. Ourselin, C. Bergeles

引用次数: 22

Learning Action Recognition Model from Depth and Skeleton Videos 从深度和骨架视频学习动作识别模型

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-22 DOI: 10.1109/ICCV.2017.621

H. Rahmani, Bennamoun

{"title":"Learning Action Recognition Model from Depth and Skeleton Videos","authors":"H. Rahmani, Bennamoun","doi":"10.1109/ICCV.2017.621","DOIUrl":"https://doi.org/10.1109/ICCV.2017.621","url":null,"abstract":"Depth sensors open up possibilities of dealing with the human action recognition problem by providing 3D human skeleton data and depth images of the scene. Analysis of human actions based on 3D skeleton data has become popular recently, due to its robustness and view-invariant representation. However, the skeleton alone is insufficient to distinguish actions which involve human-object interactions. In this paper, we propose a deep model which efficiently models human-object interactions and intra-class variations under viewpoint changes. First, a human body-part model is introduced to transfer the depth appearances of body-parts to a shared view-invariant space. Second, an end-to-end learning framework is proposed which is able to effectively combine the view-invariant body-part representation from skeletal and depth images, and learn the relations between the human body-parts and the environmental objects, the interactions between different human body-parts, and the temporal structure of human actions. We have evaluated the performance of our proposed model against 15 existing techniques on two large benchmark human action recognition datasets including NTU RGB+D and UWA3DII. The Experimental results show that our technique provides a significant improvement over state-of-the-art methods.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"9 1","pages":"5833-5842"},"PeriodicalIF":0.0,"publicationDate":"2017-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80713334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 97