2017 IEEE International Conference on Computer Vision (ICCV)最新文献

筛选
英文 中文
Shape Inpainting Using 3D Generative Adversarial Network and Recurrent Convolutional Networks 使用三维生成对抗网络和循环卷积网络的形状绘制
2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-11-17 DOI: 10.1109/ICCV.2017.252
Weiyue Wang, Qiangui Huang, Suya You, Chao Yang, U. Neumann
{"title":"Shape Inpainting Using 3D Generative Adversarial Network and Recurrent Convolutional Networks","authors":"Weiyue Wang, Qiangui Huang, Suya You, Chao Yang, U. Neumann","doi":"10.1109/ICCV.2017.252","DOIUrl":"https://doi.org/10.1109/ICCV.2017.252","url":null,"abstract":"Recent advances in convolutional neural networks have shown promising results in 3D shape completion. But due to GPU memory limitations, these methods can only produce low-resolution outputs. To inpaint 3D models with semantic plausibility and contextual details, we introduce a hybrid framework that combines a 3D Encoder-Decoder Generative Adversarial Network (3D-ED-GAN) and a Longterm Recurrent Convolutional Network (LRCN). The 3DED- GAN is a 3D convolutional neural network trained with a generative adversarial paradigm to fill missing 3D data in low-resolution. LRCN adopts a recurrent neural network architecture to minimize GPU memory usage and incorporates an Encoder-Decoder pair into a Long Shortterm Memory Network. By handling the 3D model as a sequence of 2D slices, LRCN transforms a coarse 3D shape into a more complete and higher resolution volume. While 3D-ED-GAN captures global contextual structure of the 3D shape, LRCN localizes the fine-grained details. Experimental results on both real-world and synthetic data show reconstructions from corrupted models result in complete and high-resolution 3D objects.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"46 1","pages":"2317-2325"},"PeriodicalIF":0.0,"publicationDate":"2017-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80924800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 150
Synergy between Face Alignment and Tracking via Discriminative Global Consensus Optimization 基于判别性全局共识优化的人脸对齐与跟踪协同效应
2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-26 DOI: 10.1109/ICCV.2017.409
M. H. Khan, J. McDonagh, Georgios Tzimiropoulos
{"title":"Synergy between Face Alignment and Tracking via Discriminative Global Consensus Optimization","authors":"M. H. Khan, J. McDonagh, Georgios Tzimiropoulos","doi":"10.1109/ICCV.2017.409","DOIUrl":"https://doi.org/10.1109/ICCV.2017.409","url":null,"abstract":"An open question in facial landmark localization in video is whether one should perform tracking or tracking-by-detection (i.e. face alignment). Tracking produces fittings of high accuracy but is prone to drifting. Tracking-by-detection is drift-free but results in low accuracy fittings. To provide a solution to this problem, we describe the very first, to the best of our knowledge, synergistic approach between detection (face alignment) and tracking which completely eliminates drifting from face tracking, and does not merely perform tracking-by-detection. Our first main contribution is to show that one can achieve this synergy between detection and tracking using a principled optimization framework based on the theory of Global Variable Consensus Optimization using ADMM; Our second contribution is to show how the proposed analytic framework can be integrated within state-of-the-art discriminative methods for face alignment and tracking based on cascaded regression and deeply learned features. Overall, we call our method Discriminative Global Consensus Model (DGCM). Our third contribution is to show that DGCM achieves large performance improvement over the currently best performing face tracking methods on the most challenging category of the 300-VW dataset.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"13 1","pages":"3811-3819"},"PeriodicalIF":0.0,"publicationDate":"2017-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73879259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Multi-view Dynamic Shape Refinement Using Local Temporal Integration 基于局部时间积分的多视图动态形状优化
2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-22 DOI: 10.1109/ICCV.2017.336
Vincent Leroy, Jean-Sébastien Franco, Edmond Boyer
{"title":"Multi-view Dynamic Shape Refinement Using Local Temporal Integration","authors":"Vincent Leroy, Jean-Sébastien Franco, Edmond Boyer","doi":"10.1109/ICCV.2017.336","DOIUrl":"https://doi.org/10.1109/ICCV.2017.336","url":null,"abstract":"We consider 4D shape reconstructions in multi-view environments and investigate how to exploit temporal redundancy for precision refinement. In addition to being beneficial to many dynamic multi-view scenarios this also enables larger scenes where such increased precision can compensate for the reduced spatial resolution per image frame. With precision and scalability in mind, we propose a symmetric (non-causal) local time-window geometric integration scheme over temporal sequences, where shape reconstructions are refined framewise by warping local and reliable geometric regions of neighboring frames to them. This is in contrast to recent comparable approaches targeting a different context with more compact scenes and real-time applications. These usually use a single dense volumetric update space or geometric template, which they causally track and update globally frame by frame, with limitations in scalability for larger scenes and in topology and precision with a template based strategy. Our templateless and local approach is a first step towards temporal shape super-resolution. We show that it improves reconstruction accuracy by considering multiple frames. To this purpose, and in addition to real data examples, we introduce a multi-camera synthetic dataset that provides ground-truth data for mid-scale dynamic scenes.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"21 1","pages":"3113-3122"},"PeriodicalIF":0.0,"publicationDate":"2017-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79772107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Detect to Track and Track to Detect 检测到跟踪和跟踪到检测
2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-11 DOI: 10.1109/ICCV.2017.330
Christoph Feichtenhofer, A. Pinz, Andrew Zisserman
{"title":"Detect to Track and Track to Detect","authors":"Christoph Feichtenhofer, A. Pinz, Andrew Zisserman","doi":"10.1109/ICCV.2017.330","DOIUrl":"https://doi.org/10.1109/ICCV.2017.330","url":null,"abstract":"Recent approaches for high accuracy detection and tracking of object categories in video consist of complex multistage solutions that become more cumbersome each year. In this paper we propose a ConvNet architecture that jointly performs detection and tracking, solving the task in a simple and effective way. Our contributions are threefold: (i) we set up a ConvNet architecture for simultaneous detection and tracking, using a multi-task objective for frame-based object detection and across-frame track regression; (ii) we introduce correlation features that represent object co-occurrences across time to aid the ConvNet during tracking; and (iii) we link the frame level detections based on our across-frame tracklets to produce high accuracy detections at the video level. Our ConvNet architecture for spatiotemporal object detection is evaluated on the large-scale ImageNet VID dataset where it achieves state-of-the-art results. Our approach provides better single model performance than the winning method of the last ImageNet challenge while being conceptually much simpler. Finally, we show that by increasing the temporal stride we can dramatically increase the tracker speed.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"3057-3065"},"PeriodicalIF":0.0,"publicationDate":"2017-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77303403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 485
Deeper, Broader and Artier Domain Generalization 更深、更广、更精细的领域泛化
2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-09 DOI: 10.1109/ICCV.2017.591
Da Li, Yongxin Yang, Yi-Zhe Song, Timothy M. Hospedales
{"title":"Deeper, Broader and Artier Domain Generalization","authors":"Da Li, Yongxin Yang, Yi-Zhe Song, Timothy M. Hospedales","doi":"10.1109/ICCV.2017.591","DOIUrl":"https://doi.org/10.1109/ICCV.2017.591","url":null,"abstract":"The problem of domain generalization is to learn from multiple training domains, and extract a domain-agnostic model that can then be applied to an unseen domain. Domain generalization (DG) has a clear motivation in contexts where there are target domains with distinct characteristics, yet sparse data for training. For example recognition in sketch images, which are distinctly more abstract and rarer than photos. Nevertheless, DG methods have primarily been evaluated on photo-only benchmarks focusing on alleviating the dataset bias where both problems of domain distinctiveness and data sparsity can be minimal. We argue that these benchmarks are overly straightforward, and show that simple deep learning baselines perform surprisingly well on them. In this paper, we make two main contributions: Firstly, we build upon the favorable domain shift-robust properties of deep learning methods, and develop a low-rank parameterized CNN model for end-to-end DG learning. Secondly, we develop a DG benchmark dataset covering photo, sketch, cartoon and painting domains. This is both more practically relevant, and harder (bigger domain shift) than existing benchmarks. The results show that our method outperforms existing DG alternatives, and our dataset provides a more significant DG challenge to drive future research.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"84 1","pages":"5543-5551"},"PeriodicalIF":0.0,"publicationDate":"2017-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90855023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1022
Depth Estimation Using Structured Light Flow — Analysis of Projected Pattern Flow on an Object’s Surface 使用结构光流进行深度估计-分析物体表面上的投影模式流
2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-02 DOI: 10.1109/ICCV.2017.497
Furukawa Ryo, R. Sagawa, Hiroshi Kawasaki
{"title":"Depth Estimation Using Structured Light Flow — Analysis of Projected Pattern Flow on an Object’s Surface","authors":"Furukawa Ryo, R. Sagawa, Hiroshi Kawasaki","doi":"10.1109/ICCV.2017.497","DOIUrl":"https://doi.org/10.1109/ICCV.2017.497","url":null,"abstract":"Shape reconstruction techniques using structured light have been widely researched and developed due to their robustness, high precision, and density. Because the techniques are based on decoding a pattern to find correspondences, it implicitly requires that the projected patterns be clearly captured by an image sensor, i.e., to avoid defocus and motion blur of the projected pattern. Although intensive researches have been conducted for solving defocus blur, few researches for motion blur and only solution is to capture with extremely fast shutter speed. In this paper, unlike the previous approaches, we actively utilize motion blur, which we refer to as a light flow, to estimate depth. Analysis reveals that minimum two light flows, which are retrieved from two projected patterns on the object, are required for depth estimation. To retrieve two light flows at the same time, two sets of parallel line patterns are illuminated from two video projectors and the size of motion blur of each line is precisely measured. By analyzing the light flows, i.e. lengths of the blurs, scene depth information is estimated. In the experiments, 3D shapes of fast moving objects, which are inevitably captured with motion blur, are successfully reconstructed by our technique.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"16 1","pages":"4650-4658"},"PeriodicalIF":0.0,"publicationDate":"2017-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88513060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Temporal Shape Super-Resolution by Intra-frame Motion Encoding Using High-fps Structured Light 使用高fps结构光的帧内运动编码的时间形状超分辨率
2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-02 DOI: 10.1109/ICCV.2017.22
Yuki Shiba, S. Ono, Furukawa Ryo, S. Hiura, Hiroshi Kawasaki
{"title":"Temporal Shape Super-Resolution by Intra-frame Motion Encoding Using High-fps Structured Light","authors":"Yuki Shiba, S. Ono, Furukawa Ryo, S. Hiura, Hiroshi Kawasaki","doi":"10.1109/ICCV.2017.22","DOIUrl":"https://doi.org/10.1109/ICCV.2017.22","url":null,"abstract":"One of the solutions of depth imaging of moving scene is to project a static pattern on the object and use just a single image for reconstruction. However, if the motion of the object is too fast with respect to the exposure time of the image sensor, patterns on the captured image are blurred and reconstruction fails. In this paper, we impose multiple projection patterns into each single captured image to realize temporal super resolution of the depth image sequences. With our method, multiple patterns are projected onto the object with higher fps than possible with a camera. In this case, the observed pattern varies depending on the depth and motion of the object, so we can extract temporal information of the scene from each single image. The decoding process is realized using a learning-based approach where no geometric calibration is needed. Experiments confirm the effectiveness of our method where sequential shapes are reconstructed from a single image. Both quantitative evaluations and comparisons with recent techniques were also conducted.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"11 1","pages":"115-123"},"PeriodicalIF":0.0,"publicationDate":"2017-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84328705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Misalignment-Robust Joint Filter for Cross-Modal Image Pairs 跨模态图像对的失调-鲁棒联合滤波
2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.357
Takashi Shibata, Masayuki Tanaka, M. Okutomi
{"title":"Misalignment-Robust Joint Filter for Cross-Modal Image Pairs","authors":"Takashi Shibata, Masayuki Tanaka, M. Okutomi","doi":"10.1109/ICCV.2017.357","DOIUrl":"https://doi.org/10.1109/ICCV.2017.357","url":null,"abstract":"Although several powerful joint filters for cross-modal image pairs have been proposed, the existing joint filters generate severe artifacts when there are misalignments between a target and a guidance images. Our goal is to generate an artifact-free output image even from the misaligned target and guidance images. We propose a novel misalignment-robust joint filter based on weight-volume-based image composition and joint-filter cost volume. Our proposed method first generates a set of translated guidances. Next, the joint-filter cost volume and a set of filtered images are computed from the target image and the set of the translated guidances. Then, a weight volume is obtained from the joint-filter cost volume while considering a spatial smoothness and a label-sparseness. The final output image is composed by fusing the set of the filtered images with the weight volume for the filtered images. The key is to generate the final output image directly from the set of the filtered images by weighted averaging using the weight volume that is obtained from the joint-filter cost volume. The proposed framework is widely applicable and can involve any kind of joint filter. Experimental results show that the proposed method is effective for various applications including image denosing, image up-sampling, haze removal and depth map interpolation.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"51 1","pages":"3315-3324"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73559190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding 面向密集视觉语义嵌入的分层多模态LSTM
2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.208
Zhenxing Niu, Mo Zhou, Le Wang, Xinbo Gao, G. Hua
{"title":"Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding","authors":"Zhenxing Niu, Mo Zhou, Le Wang, Xinbo Gao, G. Hua","doi":"10.1109/ICCV.2017.208","DOIUrl":"https://doi.org/10.1109/ICCV.2017.208","url":null,"abstract":"We address the problem of dense visual-semantic embedding that maps not only full sentences and whole images but also phrases within sentences and salient regions within images into a multimodal embedding space. Such dense embeddings, when applied to the task of image captioning, enable us to produce several region-oriented and detailed phrases rather than just an overview sentence to describe an image. Specifically, we present a hierarchical structured recurrent neural network (RNN), namely Hierarchical Multimodal LSTM (HM-LSTM). Compared with chain structured RNN, our proposed model exploits the hierarchical relations between sentences and phrases, and between whole images and image regions, to jointly establish their representations. Without the need of any supervised labels, our proposed model automatically learns the fine-grained correspondences between phrases and image regions towards the dense embedding. Extensive experiments on several datasets validate the efficacy of our method, which compares favorably with the state-of-the-art methods.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"15 1","pages":"1899-1907"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75295057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 139
Deep Facial Action Unit Recognition from Partially Labeled Data 基于部分标记数据的深度面部动作单元识别
2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.426
Shan Wu, Shangfei Wang, Bowen Pan, Q. Ji
{"title":"Deep Facial Action Unit Recognition from Partially Labeled Data","authors":"Shan Wu, Shangfei Wang, Bowen Pan, Q. Ji","doi":"10.1109/ICCV.2017.426","DOIUrl":"https://doi.org/10.1109/ICCV.2017.426","url":null,"abstract":"Current work on facial action unit (AU) recognition requires AU-labeled facial images. Although large amounts of facial images are readily available, AU annotation is expensive and time consuming. To address this, we propose a deep facial action unit recognition approach learning from partially AU-labeled data. The proposed approach makes full use of both partly available ground-truth AU labels and the readily available large scale facial images without annotation. Specifically, we propose to learn label distribution from the ground-truth AU labels, and then train the AU classifiers from the large-scale facial images by maximizing the log likelihood of the mapping functions of AUs with regard to the learnt label distribution for all training data and minimizing the error between predicted AUs and ground-truth AUs for labeled data simultaneously. A restricted Boltzmann machine is adopted to model AU label distribution, a deep neural network is used to learn facial representation from facial images, and the support vector machine is employed as the classifier. Experiments on two benchmark databases demonstrate the effectiveness of the proposed approach.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"19 1","pages":"3971-3979"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75348003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信