2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献_第10页

Neural Style Transfer via Meta Networks 通过元网络的神经风格迁移

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00841

Falong Shen, Shuicheng Yan, Gang Zeng

引用次数: 105

Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net 速度与激情:实时端到端3D检测，跟踪和运动预测与单一卷积网络

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00376

Wenjie Luo, Binh Yang, R. Urtasun

引用次数: 540

Duplex Generative Adversarial Network for Unsupervised Domain Adaptation 无监督域自适应的双生成对抗网络

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00162

Lanqing Hu, Meina Kan, S. Shan, Xilin Chen

{"title":"Duplex Generative Adversarial Network for Unsupervised Domain Adaptation","authors":"Lanqing Hu, Meina Kan, S. Shan, Xilin Chen","doi":"10.1109/CVPR.2018.00162","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00162","url":null,"abstract":"Domain adaptation attempts to transfer the knowledge obtained from the source domain to the target domain, i.e., the domain where the testing data are. The main challenge lies in the distribution discrepancy between source and target domain. Most existing works endeavor to learn domain invariant representation usually by minimizing a distribution distance, e.g., MMD and the discriminator in the recently proposed generative adversarial network (GAN). Following the similar idea of GAN, this work proposes a novel GAN architecture with duplex adversarial discriminators (referred to as DupGAN), which can achieve domain-invariant representation and domain transformation. Specifically, our proposed network consists of three parts, an encoder, a generator and two discriminators. The encoder embeds samples from both domains into the latent representation, and the generator decodes the latent representation to both source and target domains respectively conditioned on a domain code, i.e., achieves domain transformation. The generator is pitted against duplex discriminators, one for source domain and the other for target, to ensure the reality of domain transformation, the latent representation domain invariant and the category information of it preserved as well. Our proposed work achieves the state-of-the-art performance on unsupervised domain adaptation of digit classification and object recognition.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"32 1","pages":"1498-1507"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78544129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 153

Feature Super-Resolution: Make Machine See More Clearly 超分辨率:使机器看得更清楚

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00420

Weimin Tan, Bo Yan, Bahetiyaer Bare

{"title":"Feature Super-Resolution: Make Machine See More Clearly","authors":"Weimin Tan, Bo Yan, Bahetiyaer Bare","doi":"10.1109/CVPR.2018.00420","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00420","url":null,"abstract":"Identifying small size images or small objects is a notoriously challenging problem, as discriminative representations are difficult to learn from the limited information contained in them with poor-quality appearance and unclear object structure. Existing research works usually increase the resolution of low-resolution image in the pixel space in order to provide better visual quality for human viewing. However, the improved performance of such methods is usually limited or even trivial in the case of very small image size (we will show it in this paper explicitly). In this paper, different from image super-resolution (ISR), we propose a novel super-resolution technique called feature super-resolution (FSR), which aims at enhancing the discriminatory power of small size image in order to provide high recognition precision for machine. To achieve this goal, we propose a new Feature Super-Resolution Generative Adversarial Network (FSR-GAN) model that transforms the raw poor features of small size images to highly discriminative ones by performing super-resolution in the feature space. Our FSR-GAN consists of two subnetworks: a feature generator network G and a feature discriminator network D. By training the G and the D networks in an alternative manner, we encourage the G network to discover the latent distribution correlations between small size and large size images and then use G to improve the representations of small images. Extensive experiment results on Oxford5K, Paris, Holidays, and Flick100k datasets demonstrate that the proposed FSR approach can effectively enhance the discriminatory ability of features. Even when the resolution of query images is reduced greatly, e.g., 1/64 original size, the query feature enhanced by our FSR approach achieves surprisingly high retrieval performance at different image resolutions and increases the retrieval precision by 25% compared to the raw query feature.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"143 1","pages":"3994-4002"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78680638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Multi-level Fusion Based 3D Object Detection from Monocular Images 基于多层次融合的单目图像三维目标检测

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00249

Bin Xu, Zhenzhong Chen

{"title":"Multi-level Fusion Based 3D Object Detection from Monocular Images","authors":"Bin Xu, Zhenzhong Chen","doi":"10.1109/CVPR.2018.00249","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00249","url":null,"abstract":"In this paper, we present an end-to-end multi-level fusion based framework for 3D object detection from a single monocular image. The whole network is composed of two parts: one for 2D region proposal generation and another for simultaneously predictions of objects' 2D locations, orientations, dimensions, and 3D locations. With the help of a stand-alone module to estimate the disparity and compute the 3D point cloud, we introduce the multi-level fusion scheme. First, we encode the disparity information with a front view feature representation and fuse it with the RGB image to enhance the input. Second, features extracted from the original input and the point cloud are combined to boost the object detection. For 3D localization, we introduce an extra stream to predict the location information from point cloud directly and add it to the aforementioned location prediction. The proposed algorithm can directly output both 2D and 3D object detection results in an end-to-end fashion with only a single RGB image as the input. The experimental results on the challenging KITTI benchmark demonstrate that our algorithm significantly outperforms monocular state-of-the-art methods.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"30 1","pages":"2345-2353"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84166953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 260

A Fast Resection-Intersection Method for the Known Rotation Problem 已知旋转问题的快速剖交法

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00318

Qianggong Zhang, Tat-Jun Chin, Huu Le

引用次数: 10

PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos PoseFlow:用于理解视频中人类行为的深度运动表示

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00707

Dingwen Zhang, Guangyu Guo, Dong Huang, Junwei Han

{"title":"PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos","authors":"Dingwen Zhang, Guangyu Guo, Dong Huang, Junwei Han","doi":"10.1109/CVPR.2018.00707","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00707","url":null,"abstract":"Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This \"noisy\" motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: pose estimation and action recognition.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"13 1","pages":"6762-6770"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84710834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks 低比特深度神经网络的显式损失误差感知量化

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00982

Aojun Zhou, Anbang Yao, Kuan Wang, Yurong Chen

{"title":"Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks","authors":"Aojun Zhou, Anbang Yao, Kuan Wang, Yurong Chen","doi":"10.1109/CVPR.2018.00982","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00982","url":null,"abstract":"Benefiting from tens of millions of hierarchically stacked learnable parameters, Deep Neural Networks (DNNs) have demonstrated overwhelming accuracy on a variety of artificial intelligence tasks. However reversely, the large size of DNN models lays a heavy burden on storage, computation and power consumption, which prohibits their deployments on the embedded and mobile systems. In this paper, we propose Explicit Loss-error-aware Quantization (ELQ), a new method that can train DNN models with very low-bit parameter values such as ternary and binary ones to approximate 32-bit floating-point counterparts without noticeable loss of predication accuracy. Unlike existing methods that usually pose the problem as a straightforward approximation of the layer-wise weights or outputs of the original full-precision model (specifically, minimizing the error of the layer-wise weights or inner products of the weights and the inputs between the original and respective quantized models), our ELQ elaborately bridges the loss perturbation from the weight quantization and an incremental quantization strategy to address DNN quantization. Through explicitly regularizing the loss perturbation and the weight approximation error in an incremental way, we show that such a new optimization method is theoretically reasonable and practically effective. As validated with two mainstream convolutional neural network families (i.e., fully convolutional and non-fully convolutional), our ELQ shows better results than state-of-the-art quantization methods on the large scale ImageNet classification dataset. Code will be made publicly available.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"46 1","pages":"9426-9435"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88204691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 80

A Common Framework for Interactive Texture Transfer 交互式纹理传输的通用框架

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00665

Yifang Men, Z. Lian, Yingmin Tang, Jianguo Xiao

{"title":"A Common Framework for Interactive Texture Transfer","authors":"Yifang Men, Z. Lian, Yingmin Tang, Jianguo Xiao","doi":"10.1109/CVPR.2018.00665","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00665","url":null,"abstract":"In this paper, we present a general-purpose solution to interactive texture transfer problems that better preserves both local structure and visual richness. It is challenging due to the diversity of tasks and the simplicity of required user guidance. The core idea of our common framework is to use multiple custom channels to dynamically guide the synthesis process. For interactivity, users can control the spatial distribution of stylized textures via semantic channels. The structure guidance, acquired by two stages of automatic extraction and propagation of structure information, provides a prior for initialization and preserves the salient structure by searching the nearest neighbor fields (NNF) with structure coherence. Meanwhile, texture coherence is also exploited to maintain similar style with the source image. In addition, we leverage an improved PatchMatch with extended NNF and matrix operations to obtain transformable source patches with richer geometric information at high speed. We demonstrate the effectiveness and superiority of our method on a variety of scenes through extensive comparisons with state-of-the-art algorithms.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"53 1","pages":"6353-6362"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90954518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Salience Guided Depth Calibration for Perceptually Optimized Compressive Light Field 3D Display 感知优化压缩光场三维显示的显著性引导深度校准

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00217

Shizheng Wang, Wenjuan Liao, P. Surman, Zhigang Tu, Yuanjin Zheng, Junsong Yuan

{"title":"Salience Guided Depth Calibration for Perceptually Optimized Compressive Light Field 3D Display","authors":"Shizheng Wang, Wenjuan Liao, P. Surman, Zhigang Tu, Yuanjin Zheng, Junsong Yuan","doi":"10.1109/CVPR.2018.00217","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00217","url":null,"abstract":"Multi-layer light field displays are a type of computational three-dimensional (3D) display which has recently gained increasing interest for its holographic-like effect and natural compatibility with 2D displays. However, the major shortcoming, depth limitation, still cannot be overcome in the traditional light field modeling and reconstruction based on multi-layer liquid crystal displays (LCDs). Considering this disadvantage, our paper incorporates a salience guided depth optimization over a limited display range to calibrate the displayed depth and present the maximum area of salience region for multi-layer light field display. Different from previously reported cascaded light field displays that use the fixed initialization plane as the depth center of display content, our method automatically calibrates the depth initialization based on the salience results derived from the proposed contrast enhanced salience detection method. Experiments demonstrate that the proposed method provides a promising advantage in visual perception for the compressive light field displays from both software simulation and prototype demonstration.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"62 1","pages":"2031-2040"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77902099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13