2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献

Representing and Learning High Dimensional Data with the Optimal Transport Map from a Probabilistic Viewpoint 从概率的角度用最优运输图表示和学习高维数据

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00820

Serim Park, Matthew Thorpe

{"title":"Representing and Learning High Dimensional Data with the Optimal Transport Map from a Probabilistic Viewpoint","authors":"Serim Park, Matthew Thorpe","doi":"10.1109/CVPR.2018.00820","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00820","url":null,"abstract":"In this paper, we propose a generative model in the space of diffeomorphic deformation maps. More precisely, we utilize the Kantarovich-Wasserstein metric and accompanying geometry to represent an image as a deformation from templates. Moreover, we incorporate a probabilistic viewpoint by assuming that each image is locally generated from a reference image. We capture the local structure by modelling the tangent planes at reference images. Once basis vectors for each tangent plane are learned via probabilistic PCA, we can sample a local coordinate, that can be inverted back to image space exactly. With experiments using 4 different datasets, we show that the generative tangent plane model in the optimal transport (OT) manifold can be learned with small numbers of images and can be used to create infinitely many 'unseen' images. In addition, the Bayesian classification accompanied with the probabilist modeling of the tangent planes shows improved accuracy over that done in the image space. Combining the results of our experiments supports our claim that certain datasets can be better represented with the Kantarovich-Wasserstein metric. We envision that the proposed method could be a practical solution to learning and representing data that is generated with templates in situatons where only limited numbers of data points are available.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"122 1","pages":"7864-7872"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73107147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Finding Tiny Faces in the Wild with Generative Adversarial Network 用生成对抗网络在野外寻找小面孔

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00010

Yancheng Bai, Yongqiang Zhang, M. Ding, Bernard Ghanem

引用次数: 177

Single Image Dehazing via Conditional Generative Adversarial Network 基于条件生成对抗网络的单幅图像去雾

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00856

Runde Li, Jin-shan Pan, Zechao Li, Jinhui Tang

引用次数: 312

3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare 3D- rcnn:通过渲染和比较的实例级3D对象重建

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00375

Abhijit Kundu, Yin Li, James M. Rehg

引用次数: 287

Hybrid Camera Pose Estimation 混合摄像机姿态估计

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00022

Federico Camposeco, Andrea Cohen, M. Pollefeys, Torsten Sattler

{"title":"Hybrid Camera Pose Estimation","authors":"Federico Camposeco, Andrea Cohen, M. Pollefeys, Torsten Sattler","doi":"10.1109/CVPR.2018.00022","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00022","url":null,"abstract":"In this paper, we aim to solve the pose estimation problem of calibrated pinhole and generalized cameras w.r.t. a Structure-from-Motion (SfM) model by leveraging both 2D-3D correspondences as well as 2D-2D correspondences. Traditional approaches either focus on the use of 2D-3D matches, known as structure-based pose estimation or solely on 2D-2D matches (structure-less pose estimation). Absolute pose approaches are limited in their performance by the quality of the 3D point triangulations as well as the completeness of the 3D model. Relative pose approaches, on the other hand, while being more accurate, also tend to be far more computationally costly and often return dozens of possible solutions. This work aims to bridge the gap between these two paradigms. We propose a new RANSAC-based approach that automatically chooses the best type of solver to use at each iteration in a data-driven way. The solvers chosen by our RANSAC can range from pure structure-based or structure-less solvers, to any possible combination of hybrid solvers (i.e. using both types of matches) in between. A number of these new hybrid minimal solvers are also presented in this paper. Both synthetic and real data experiments show our approach to be as accurate as structure-less approaches, while staying close to the efficiency of structure-based methods.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"16 2 1","pages":"136-144"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78023021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

HashGAN: Deep Learning to Hash with Pair Conditional Wasserstein GAN HashGAN:使用对条件Wasserstein GAN的深度学习哈希

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00140

Yue Cao, Bin Liu, Mingsheng Long, Jianmin Wang

引用次数: 94

A Prior-Less Method for Multi-face Tracking in Unconstrained Videos 无约束视频中多人脸跟踪的无先验方法

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00063

Chung-Ching Lin, Ying Hung

{"title":"A Prior-Less Method for Multi-face Tracking in Unconstrained Videos","authors":"Chung-Ching Lin, Ying Hung","doi":"10.1109/CVPR.2018.00063","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00063","url":null,"abstract":"This paper presents a prior-less method for tracking and clustering an unknown number of human faces and maintaining their individual identities in unconstrained videos. The key challenge is to accurately track faces with partial occlusion and drastic appearance changes in multiple shots resulting from significant variations of makeup, facial expression, head pose and illumination. To address this challenge, we propose a new multi-face tracking and re-identification algorithm, which provides high accuracy in face association in the entire video with automatic cluster number generation, and is robust to outliers. We develop a co-occurrence model of multiple body parts to seamlessly create face tracklets, and recursively link tracklets to construct a graph for extracting clusters. A Gaussian Process model is introduced to compensate the deep feature insufficiency, and is further used to refine the linking results. The advantages of the proposed algorithm are demonstrated using a variety of challenging music videos and newly introduced body-worn camera videos. The proposed method obtains significant improvements over the state of the art [51], while relying less on handling video-specific prior information to achieve high performance.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"80 1","pages":"538-547"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80880656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization HSA-RNN:用于视频摘要的层次结构自适应RNN

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00773

Bin Zhao, Xuelong Li, Xiaoqiang Lu

引用次数: 155

Fine-Grained Video Captioning for Sports Narrative 精细的体育叙事视频字幕

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00629

Huanyu Yu, Shuo Cheng, Bingbing Ni, Minsi Wang, Jian Zhang, Xiaokang Yang

{"title":"Fine-Grained Video Captioning for Sports Narrative","authors":"Huanyu Yu, Shuo Cheng, Bingbing Ni, Minsi Wang, Jian Zhang, Xiaokang Yang","doi":"10.1109/CVPR.2018.00629","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00629","url":null,"abstract":"Despite recent emergence of video caption methods, how to generate fine-grained video descriptions (i.e., long and detailed commentary about individual movements of multiple subjects as well as their frequent interactions) is far from being solved, which however has great applications such as automatic sports narrative. To this end, this work makes the following contributions. First, to facilitate this novel research of fine-grained video caption, we collected a novel dataset called Fine-grained Sports Narrative dataset (FSN) that contains 2K sports videos with ground-truth narratives from YouTube.com. Second, we develop a novel performance evaluation metric named Fine-grained Captioning Evaluation (FCE) to cope with this novel task. Considered as an extension of the widely used METEOR, it measures not only the linguistic performance but also whether the action details and their temporal orders are correctly described. Third, we propose a new framework for fine-grained sports narrative task. This network features three branches: 1) a spatio-temporal entity localization and role discovering sub-network; 2) a fine-grained action modeling sub-network for local skeleton motion description; and 3) a group relationship modeling sub-network to model interactions between players. We further fuse the features and decode them into long narratives by a hierarchically recurrent structure. Extensive experiments on the FSN dataset demonstrates the validity of the proposed framework for fine-grained video caption.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"15 2 1","pages":"6006-6015"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78520722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis FaceID-GAN:学习一种对称的三人GAN，用于保持身份的人脸合成

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00092

Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, Xiaoou Tang

{"title":"FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis","authors":"Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, Xiaoou Tang","doi":"10.1109/CVPR.2018.00092","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00092","url":null,"abstract":"Face synthesis has achieved advanced development by using generative adversarial networks (GANs). Existing methods typically formulate GAN as a two-player game, where a discriminator distinguishes face images from the real and synthesized domains, while a generator reduces its discriminativeness by synthesizing a face of photorealistic quality. Their competition converges when the discriminator is unable to differentiate these two domains. Unlike two-player GANs, this work generates identity-preserving faces by proposing FaceID-GAN, which treats a classifier of face identity as the third player, competing with the generator by distinguishing the identities of the real and synthesized faces (see Fig.1). A stationary point is reached when the generator produces faces that have high quality as well as preserve identity. Instead of simply modeling the identity classifier as an additional discriminator, FaceID-GAN is formulated by satisfying information symmetry, which ensures that the real and synthesized images are projected into the same feature space. In other words, the identity classifier is used to extract identity features from both input (real) and output (synthesized) face images of the generator, substantially alleviating training difficulty of GAN. Extensive experiments show that FaceID-GAN is able to generate faces of arbitrary viewpoint while preserve identity, outperforming recent advanced approaches.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"821-830"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72882487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 150