2017 IEEE International Conference on Computer Vision (ICCV)最新文献_第2页

Shape Inpainting Using 3D Generative Adversarial Network and Recurrent Convolutional Networks 使用三维生成对抗网络和循环卷积网络的形状绘制

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-11-17 DOI: 10.1109/ICCV.2017.252

Weiyue Wang, Qiangui Huang, Suya You, Chao Yang, U. Neumann

{"title":"Shape Inpainting Using 3D Generative Adversarial Network and Recurrent Convolutional Networks","authors":"Weiyue Wang, Qiangui Huang, Suya You, Chao Yang, U. Neumann","doi":"10.1109/ICCV.2017.252","DOIUrl":"https://doi.org/10.1109/ICCV.2017.252","url":null,"abstract":"Recent advances in convolutional neural networks have shown promising results in 3D shape completion. But due to GPU memory limitations, these methods can only produce low-resolution outputs. To inpaint 3D models with semantic plausibility and contextual details, we introduce a hybrid framework that combines a 3D Encoder-Decoder Generative Adversarial Network (3D-ED-GAN) and a Longterm Recurrent Convolutional Network (LRCN). The 3DED- GAN is a 3D convolutional neural network trained with a generative adversarial paradigm to fill missing 3D data in low-resolution. LRCN adopts a recurrent neural network architecture to minimize GPU memory usage and incorporates an Encoder-Decoder pair into a Long Shortterm Memory Network. By handling the 3D model as a sequence of 2D slices, LRCN transforms a coarse 3D shape into a more complete and higher resolution volume. While 3D-ED-GAN captures global contextual structure of the 3D shape, LRCN localizes the fine-grained details. Experimental results on both real-world and synthetic data show reconstructions from corrupted models result in complete and high-resolution 3D objects.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"46 1","pages":"2317-2325"},"PeriodicalIF":0.0,"publicationDate":"2017-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80924800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 150

Synergy between Face Alignment and Tracking via Discriminative Global Consensus Optimization 基于判别性全局共识优化的人脸对齐与跟踪协同效应

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-26 DOI: 10.1109/ICCV.2017.409

M. H. Khan, J. McDonagh, Georgios Tzimiropoulos

{"title":"Synergy between Face Alignment and Tracking via Discriminative Global Consensus Optimization","authors":"M. H. Khan, J. McDonagh, Georgios Tzimiropoulos","doi":"10.1109/ICCV.2017.409","DOIUrl":"https://doi.org/10.1109/ICCV.2017.409","url":null,"abstract":"An open question in facial landmark localization in video is whether one should perform tracking or tracking-by-detection (i.e. face alignment). Tracking produces fittings of high accuracy but is prone to drifting. Tracking-by-detection is drift-free but results in low accuracy fittings. To provide a solution to this problem, we describe the very first, to the best of our knowledge, synergistic approach between detection (face alignment) and tracking which completely eliminates drifting from face tracking, and does not merely perform tracking-by-detection. Our first main contribution is to show that one can achieve this synergy between detection and tracking using a principled optimization framework based on the theory of Global Variable Consensus Optimization using ADMM; Our second contribution is to show how the proposed analytic framework can be integrated within state-of-the-art discriminative methods for face alignment and tracking based on cascaded regression and deeply learned features. Overall, we call our method Discriminative Global Consensus Model (DGCM). Our third contribution is to show that DGCM achieves large performance improvement over the currently best performing face tracking methods on the most challenging category of the 300-VW dataset.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"13 1","pages":"3811-3819"},"PeriodicalIF":0.0,"publicationDate":"2017-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73879259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Multi-view Dynamic Shape Refinement Using Local Temporal Integration 基于局部时间积分的多视图动态形状优化

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-22 DOI: 10.1109/ICCV.2017.336

Vincent Leroy, Jean-Sébastien Franco, Edmond Boyer

{"title":"Multi-view Dynamic Shape Refinement Using Local Temporal Integration","authors":"Vincent Leroy, Jean-Sébastien Franco, Edmond Boyer","doi":"10.1109/ICCV.2017.336","DOIUrl":"https://doi.org/10.1109/ICCV.2017.336","url":null,"abstract":"We consider 4D shape reconstructions in multi-view environments and investigate how to exploit temporal redundancy for precision refinement. In addition to being beneficial to many dynamic multi-view scenarios this also enables larger scenes where such increased precision can compensate for the reduced spatial resolution per image frame. With precision and scalability in mind, we propose a symmetric (non-causal) local time-window geometric integration scheme over temporal sequences, where shape reconstructions are refined framewise by warping local and reliable geometric regions of neighboring frames to them. This is in contrast to recent comparable approaches targeting a different context with more compact scenes and real-time applications. These usually use a single dense volumetric update space or geometric template, which they causally track and update globally frame by frame, with limitations in scalability for larger scenes and in topology and precision with a template based strategy. Our templateless and local approach is a first step towards temporal shape super-resolution. We show that it improves reconstruction accuracy by considering multiple frames. To this purpose, and in addition to real data examples, we introduce a multi-camera synthetic dataset that provides ground-truth data for mid-scale dynamic scenes.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"21 1","pages":"3113-3122"},"PeriodicalIF":0.0,"publicationDate":"2017-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79772107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Detect to Track and Track to Detect 检测到跟踪和跟踪到检测

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-11 DOI: 10.1109/ICCV.2017.330

Christoph Feichtenhofer, A. Pinz, Andrew Zisserman

{"title":"Detect to Track and Track to Detect","authors":"Christoph Feichtenhofer, A. Pinz, Andrew Zisserman","doi":"10.1109/ICCV.2017.330","DOIUrl":"https://doi.org/10.1109/ICCV.2017.330","url":null,"abstract":"Recent approaches for high accuracy detection and tracking of object categories in video consist of complex multistage solutions that become more cumbersome each year. In this paper we propose a ConvNet architecture that jointly performs detection and tracking, solving the task in a simple and effective way. Our contributions are threefold: (i) we set up a ConvNet architecture for simultaneous detection and tracking, using a multi-task objective for frame-based object detection and across-frame track regression; (ii) we introduce correlation features that represent object co-occurrences across time to aid the ConvNet during tracking; and (iii) we link the frame level detections based on our across-frame tracklets to produce high accuracy detections at the video level. Our ConvNet architecture for spatiotemporal object detection is evaluated on the large-scale ImageNet VID dataset where it achieves state-of-the-art results. Our approach provides better single model performance than the winning method of the last ImageNet challenge while being conceptually much simpler. Finally, we show that by increasing the temporal stride we can dramatically increase the tracker speed.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"3057-3065"},"PeriodicalIF":0.0,"publicationDate":"2017-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77303403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 485

Deeper, Broader and Artier Domain Generalization 更深、更广、更精细的领域泛化

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-09 DOI: 10.1109/ICCV.2017.591

Da Li, Yongxin Yang, Yi-Zhe Song, Timothy M. Hospedales

{"title":"Deeper, Broader and Artier Domain Generalization","authors":"Da Li, Yongxin Yang, Yi-Zhe Song, Timothy M. Hospedales","doi":"10.1109/ICCV.2017.591","DOIUrl":"https://doi.org/10.1109/ICCV.2017.591","url":null,"abstract":"The problem of domain generalization is to learn from multiple training domains, and extract a domain-agnostic model that can then be applied to an unseen domain. Domain generalization (DG) has a clear motivation in contexts where there are target domains with distinct characteristics, yet sparse data for training. For example recognition in sketch images, which are distinctly more abstract and rarer than photos. Nevertheless, DG methods have primarily been evaluated on photo-only benchmarks focusing on alleviating the dataset bias where both problems of domain distinctiveness and data sparsity can be minimal. We argue that these benchmarks are overly straightforward, and show that simple deep learning baselines perform surprisingly well on them. In this paper, we make two main contributions: Firstly, we build upon the favorable domain shift-robust properties of deep learning methods, and develop a low-rank parameterized CNN model for end-to-end DG learning. Secondly, we develop a DG benchmark dataset covering photo, sketch, cartoon and painting domains. This is both more practically relevant, and harder (bigger domain shift) than existing benchmarks. The results show that our method outperforms existing DG alternatives, and our dataset provides a more significant DG challenge to drive future research.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"84 1","pages":"5543-5551"},"PeriodicalIF":0.0,"publicationDate":"2017-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90855023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1022

Depth Estimation Using Structured Light Flow — Analysis of Projected Pattern Flow on an Object’s Surface 使用结构光流进行深度估计-分析物体表面上的投影模式流

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-02 DOI: 10.1109/ICCV.2017.497

Furukawa Ryo, R. Sagawa, Hiroshi Kawasaki

{"title":"Depth Estimation Using Structured Light Flow — Analysis of Projected Pattern Flow on an Object’s Surface","authors":"Furukawa Ryo, R. Sagawa, Hiroshi Kawasaki","doi":"10.1109/ICCV.2017.497","DOIUrl":"https://doi.org/10.1109/ICCV.2017.497","url":null,"abstract":"Shape reconstruction techniques using structured light have been widely researched and developed due to their robustness, high precision, and density. Because the techniques are based on decoding a pattern to find correspondences, it implicitly requires that the projected patterns be clearly captured by an image sensor, i.e., to avoid defocus and motion blur of the projected pattern. Although intensive researches have been conducted for solving defocus blur, few researches for motion blur and only solution is to capture with extremely fast shutter speed. In this paper, unlike the previous approaches, we actively utilize motion blur, which we refer to as a light flow, to estimate depth. Analysis reveals that minimum two light flows, which are retrieved from two projected patterns on the object, are required for depth estimation. To retrieve two light flows at the same time, two sets of parallel line patterns are illuminated from two video projectors and the size of motion blur of each line is precisely measured. By analyzing the light flows, i.e. lengths of the blurs, scene depth information is estimated. In the experiments, 3D shapes of fast moving objects, which are inevitably captured with motion blur, are successfully reconstructed by our technique.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"16 1","pages":"4650-4658"},"PeriodicalIF":0.0,"publicationDate":"2017-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88513060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Temporal Shape Super-Resolution by Intra-frame Motion Encoding Using High-fps Structured Light 使用高fps结构光的帧内运动编码的时间形状超分辨率

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-02 DOI: 10.1109/ICCV.2017.22

Yuki Shiba, S. Ono, Furukawa Ryo, S. Hiura, Hiroshi Kawasaki

引用次数: 4

Misalignment-Robust Joint Filter for Cross-Modal Image Pairs 跨模态图像对的失调-鲁棒联合滤波

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.357

Takashi Shibata, Masayuki Tanaka, M. Okutomi

{"title":"Misalignment-Robust Joint Filter for Cross-Modal Image Pairs","authors":"Takashi Shibata, Masayuki Tanaka, M. Okutomi","doi":"10.1109/ICCV.2017.357","DOIUrl":"https://doi.org/10.1109/ICCV.2017.357","url":null,"abstract":"Although several powerful joint filters for cross-modal image pairs have been proposed, the existing joint filters generate severe artifacts when there are misalignments between a target and a guidance images. Our goal is to generate an artifact-free output image even from the misaligned target and guidance images. We propose a novel misalignment-robust joint filter based on weight-volume-based image composition and joint-filter cost volume. Our proposed method first generates a set of translated guidances. Next, the joint-filter cost volume and a set of filtered images are computed from the target image and the set of the translated guidances. Then, a weight volume is obtained from the joint-filter cost volume while considering a spatial smoothness and a label-sparseness. The final output image is composed by fusing the set of the filtered images with the weight volume for the filtered images. The key is to generate the final output image directly from the set of the filtered images by weighted averaging using the weight volume that is obtained from the joint-filter cost volume. The proposed framework is widely applicable and can involve any kind of joint filter. Experimental results show that the proposed method is effective for various applications including image denosing, image up-sampling, haze removal and depth map interpolation.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"51 1","pages":"3315-3324"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73559190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding 面向密集视觉语义嵌入的分层多模态LSTM

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.208

Zhenxing Niu, Mo Zhou, Le Wang, Xinbo Gao, G. Hua

引用次数: 139

Deep Facial Action Unit Recognition from Partially Labeled Data 基于部分标记数据的深度面部动作单元识别

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.426

Shan Wu, Shangfei Wang, Bowen Pan, Q. Ji

{"title":"Deep Facial Action Unit Recognition from Partially Labeled Data","authors":"Shan Wu, Shangfei Wang, Bowen Pan, Q. Ji","doi":"10.1109/ICCV.2017.426","DOIUrl":"https://doi.org/10.1109/ICCV.2017.426","url":null,"abstract":"Current work on facial action unit (AU) recognition requires AU-labeled facial images. Although large amounts of facial images are readily available, AU annotation is expensive and time consuming. To address this, we propose a deep facial action unit recognition approach learning from partially AU-labeled data. The proposed approach makes full use of both partly available ground-truth AU labels and the readily available large scale facial images without annotation. Specifically, we propose to learn label distribution from the ground-truth AU labels, and then train the AU classifiers from the large-scale facial images by maximizing the log likelihood of the mapping functions of AUs with regard to the learnt label distribution for all training data and minimizing the error between predicted AUs and ground-truth AUs for labeled data simultaneously. A restricted Boltzmann machine is adopted to model AU label distribution, a deep neural network is used to learn facial representation from facial images, and the support vector machine is employed as the classifier. Experiments on two benchmark databases demonstrate the effectiveness of the proposed approach.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"19 1","pages":"3971-3979"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75348003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23