2019 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

Attentional Neural Fields for Crowd Counting 人群计数的注意神经场

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00581

Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, Ling Shao

{"title":"Attentional Neural Fields for Crowd Counting","authors":"Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, Ling Shao","doi":"10.1109/ICCV.2019.00581","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00581","url":null,"abstract":"Crowd counting has recently generated huge popularity in computer vision, and is extremely challenging due to the huge scale variations of objects. In this paper, we propose the Attentional Neural Field (ANF) for crowd counting via density estimation. Within the encoder-decoder network, we introduce conditional random fields (CRFs) to aggregate multi-scale features, which can build more informative representations. To better model pair-wise potentials in CRFs, we incorperate non-local attention mechanism implemented as inter- and intra-layer attentions to expand the receptive field to the entire image respectively within the same layer and across different layers, which captures long-range dependencies to conquer huge scale variations. The CRFs coupled with the attention mechanism are seamlessly integrated into the encoder-decoder network, establishing an ANF that can be optimized end-to-end by back propagation. We conduct extensive experiments on four public datasets, including ShanghaiTech, WorldEXPO 10, UCF-CC-50 and UCF-QNRF. The results show that our ANF achieves high counting performance, surpassing most previous methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"15 1","pages":"5713-5722"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73993702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 98

Enriched Feature Guided Refinement Network for Object Detection 增强特征导向的目标检测细化网络

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00963

Jing Nie, R. Anwer, Hisham Cholakkal, F. Khan, Yanwei Pang, Ling Shao

{"title":"Enriched Feature Guided Refinement Network for Object Detection","authors":"Jing Nie, R. Anwer, Hisham Cholakkal, F. Khan, Yanwei Pang, Ling Shao","doi":"10.1109/ICCV.2019.00963","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00963","url":null,"abstract":"We propose a single-stage detection framework that jointly tackles the problem of multi-scale object detection and class imbalance. Rather than designing deeper networks, we introduce a simple yet effective feature enrichment scheme to produce multi-scale contextual features. We further introduce a cascaded refinement scheme which first instills multi-scale contextual features into the prediction layers of the single-stage detector in order to enrich their discriminative power for multi-scale detection. Second, the cascaded refinement scheme counters the class imbalance problem by refining the anchors and enriched features to improve classification and regression. Experiments are performed on two benchmarks: PASCAL VOC and MS COCO. For a 320×320 input on the MS COCO test-dev, our detector achieves state-of-the-art single-stage detection accuracy with a COCO AP of 33.2 in the case of single-scale inference, while operating at 21 milliseconds on a Titan XP GPU. For a 512×512 input on the MS COCO test-dev, our approach obtains an absolute gain of 1.6% in terms of COCO AP, compared to the best reported single-stage results[5]. Source code and models are available at: https://github.com/Ranchentx/EFGRNet.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"10 1","pages":"9536-9545"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75272716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 68

Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data 基于上下文感知特征和标签融合的部分标记数据面部动作单元强度估计

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00082

Yong Zhang, Haiyong Jiang, Baoyuan Wu, Yanbo Fan, Q. Ji

{"title":"Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data","authors":"Yong Zhang, Haiyong Jiang, Baoyuan Wu, Yanbo Fan, Q. Ji","doi":"10.1109/ICCV.2019.00082","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00082","url":null,"abstract":"Facial action unit (AU) intensity estimation is a fundamental task for facial behaviour analysis. Most previous methods use a whole face image as input for intensity prediction. Considering that AUs are defined according to their corresponding local appearance, a few patch-based methods utilize image features of local patches. However, fusion of local features is always performed via straightforward feature concatenation or summation. Besides, these methods require fully annotated databases for model learning, which is expensive to acquire. In this paper, we propose a novel weakly supervised patch-based deep model on basis of two types of attention mechanisms for joint intensity estimation of multiple AUs. The model consists of a feature fusion module and a label fusion module. And we augment attention mechanisms of these two modules with a learnable task-related context, as one patch may play different roles in analyzing different AUs and each AU has its own temporal evolution rule. The context-aware feature fusion module is used to capture spatial relationships among local patches while the context-aware label fusion module is used to capture the temporal dynamics of AUs. The latter enables the model to be trained on a partially annotated database. Experimental evaluations on two benchmark expression databases demonstrate the superior performance of the proposed method.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"733-742"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74712648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Dynamic PET Image Reconstruction Using Nonnegative Matrix Factorization Incorporated With Deep Image Prior 基于深度图像先验的非负矩阵分解方法的PET图像动态重构

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00322

Tatsuya Yokota, Kazuya Kawai, M. Sakata, Y. Kimura, H. Hontani

引用次数: 39

Face Video Deblurring Using 3D Facial Priors 面部视频去模糊使用3D面部先验

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00948

Wenqi Ren, Jiaolong Yang, Senyou Deng, D. Wipf, Xiaochun Cao, Xin Tong

引用次数: 40

Distillation-Based Training for Multi-Exit Architectures 多出口架构的基于蒸馏的培训

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00144

Mary Phuong, Christoph H. Lampert

引用次数: 120

Multi-Class Part Parsing With Joint Boundary-Semantic Awareness 基于联合边界语义感知的多类零件解析

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00927

Yifan Zhao, Jia Li, Yu Zhang, Yonghong Tian

{"title":"Multi-Class Part Parsing With Joint Boundary-Semantic Awareness","authors":"Yifan Zhao, Jia Li, Yu Zhang, Yonghong Tian","doi":"10.1109/ICCV.2019.00927","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00927","url":null,"abstract":"Object part parsing in the wild, which requires to simultaneously detect multiple object classes in the scene and accurately segments semantic parts within each class, is challenging for the joint presence of class-level and part-level ambiguities. Despite its importance, however, this problem is not sufficiently explored in existing works. In this paper, we propose a joint parsing framework with boundary and semantic awareness to address this challenging problem. To handle part-level ambiguity, a boundary awareness module is proposed to make mid-level features at multiple scales attend to part boundaries for accurate part localization, which are then fused with high-level features for effective part recognition. For class-level ambiguity, we further present a semantic awareness module that selects discriminative part features relevant to a category to prevent irrelevant features being merged together. The proposed modules are lightweight and implementation friendly, improving the performance substantially when plugged into various baseline architectures. Without bells and whistles, the full model sets new state-of-the-art results on the Pascal-Part dataset, in both multi-class and the conventional single-class setting, while running substantially faster than recent high-performance approaches.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"2 1","pages":"9176-9185"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75479109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Elaborate Monocular Point and Line SLAM With Robust Initialization 基于鲁棒初始化的精细单目点线SLAM

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00121

Sang Jun Lee, S. Hwang

引用次数: 20

Align, Attend and Locate: Chest X-Ray Diagnosis via Contrast Induced Attention Network With Limited Supervision 对齐，出席和定位:在有限监督下通过造影剂引起的注意网络进行胸部x线诊断

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.01073

Jingyun Liu, Gangming Zhao, Yu Fei, Ming Zhang, Yizhou Wang, Yizhou Yu

引用次数: 79

Adaptive Density Map Generation for Crowd Counting 用于人群计数的自适应密度图生成

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00122

Jia Wan, Antoni B. Chan

{"title":"Adaptive Density Map Generation for Crowd Counting","authors":"Jia Wan, Antoni B. Chan","doi":"10.1109/ICCV.2019.00122","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00122","url":null,"abstract":"Crowd counting is an important topic in computer vision due to its practical usage in surveillance systems. The typical design of crowd counting algorithms is divided into two steps. First, the ground-truth density maps of crowd images are generated from the ground-truth dot maps (density map generation), e.g., by convolving with a Gaussian kernel. Second, deep learning models are designed to predict a density map from an input image (density map estimation). Most research efforts have concentrated on the density map estimation problem, while the problem of density map generation has not been adequately explored. In particular, the density map could be considered as an intermediate representation used to train a crowd counting network. In the sense of end-to-end training, the hand-crafted methods used for generating the density maps may not be optimal for the particular network or dataset used. To address this issue, we first show the impact of different density maps and that better ground-truth density maps can be obtained by refining the existing ones using a learned refinement network, which is jointly trained with the counter. Then, we propose an adaptive density map generator, which takes the annotation dot map as input, and learns a density map representation for a counter. The counter and generator are trained jointly within an end-to-end framework. The experiment results on popular counting datasets confirm the effectiveness of the proposed learnable density map representations.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"167 1","pages":"1130-1139"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80492265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 132