2019 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

筛选
英文 中文
Attentional Neural Fields for Crowd Counting 人群计数的注意神经场
2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00581
Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, Ling Shao
{"title":"Attentional Neural Fields for Crowd Counting","authors":"Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, Ling Shao","doi":"10.1109/ICCV.2019.00581","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00581","url":null,"abstract":"Crowd counting has recently generated huge popularity in computer vision, and is extremely challenging due to the huge scale variations of objects. In this paper, we propose the Attentional Neural Field (ANF) for crowd counting via density estimation. Within the encoder-decoder network, we introduce conditional random fields (CRFs) to aggregate multi-scale features, which can build more informative representations. To better model pair-wise potentials in CRFs, we incorperate non-local attention mechanism implemented as inter- and intra-layer attentions to expand the receptive field to the entire image respectively within the same layer and across different layers, which captures long-range dependencies to conquer huge scale variations. The CRFs coupled with the attention mechanism are seamlessly integrated into the encoder-decoder network, establishing an ANF that can be optimized end-to-end by back propagation. We conduct extensive experiments on four public datasets, including ShanghaiTech, WorldEXPO 10, UCF-CC-50 and UCF-QNRF. The results show that our ANF achieves high counting performance, surpassing most previous methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"15 1","pages":"5713-5722"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73993702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 98
Enriched Feature Guided Refinement Network for Object Detection 增强特征导向的目标检测细化网络
2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00963
Jing Nie, R. Anwer, Hisham Cholakkal, F. Khan, Yanwei Pang, Ling Shao
{"title":"Enriched Feature Guided Refinement Network for Object Detection","authors":"Jing Nie, R. Anwer, Hisham Cholakkal, F. Khan, Yanwei Pang, Ling Shao","doi":"10.1109/ICCV.2019.00963","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00963","url":null,"abstract":"We propose a single-stage detection framework that jointly tackles the problem of multi-scale object detection and class imbalance. Rather than designing deeper networks, we introduce a simple yet effective feature enrichment scheme to produce multi-scale contextual features. We further introduce a cascaded refinement scheme which first instills multi-scale contextual features into the prediction layers of the single-stage detector in order to enrich their discriminative power for multi-scale detection. Second, the cascaded refinement scheme counters the class imbalance problem by refining the anchors and enriched features to improve classification and regression. Experiments are performed on two benchmarks: PASCAL VOC and MS COCO. For a 320×320 input on the MS COCO test-dev, our detector achieves state-of-the-art single-stage detection accuracy with a COCO AP of 33.2 in the case of single-scale inference, while operating at 21 milliseconds on a Titan XP GPU. For a 512×512 input on the MS COCO test-dev, our approach obtains an absolute gain of 1.6% in terms of COCO AP, compared to the best reported single-stage results[5]. Source code and models are available at: https://github.com/Ranchentx/EFGRNet.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"10 1","pages":"9536-9545"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75272716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data 基于上下文感知特征和标签融合的部分标记数据面部动作单元强度估计
2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00082
Yong Zhang, Haiyong Jiang, Baoyuan Wu, Yanbo Fan, Q. Ji
{"title":"Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data","authors":"Yong Zhang, Haiyong Jiang, Baoyuan Wu, Yanbo Fan, Q. Ji","doi":"10.1109/ICCV.2019.00082","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00082","url":null,"abstract":"Facial action unit (AU) intensity estimation is a fundamental task for facial behaviour analysis. Most previous methods use a whole face image as input for intensity prediction. Considering that AUs are defined according to their corresponding local appearance, a few patch-based methods utilize image features of local patches. However, fusion of local features is always performed via straightforward feature concatenation or summation. Besides, these methods require fully annotated databases for model learning, which is expensive to acquire. In this paper, we propose a novel weakly supervised patch-based deep model on basis of two types of attention mechanisms for joint intensity estimation of multiple AUs. The model consists of a feature fusion module and a label fusion module. And we augment attention mechanisms of these two modules with a learnable task-related context, as one patch may play different roles in analyzing different AUs and each AU has its own temporal evolution rule. The context-aware feature fusion module is used to capture spatial relationships among local patches while the context-aware label fusion module is used to capture the temporal dynamics of AUs. The latter enables the model to be trained on a partially annotated database. Experimental evaluations on two benchmark expression databases demonstrate the superior performance of the proposed method.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"733-742"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74712648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Dynamic PET Image Reconstruction Using Nonnegative Matrix Factorization Incorporated With Deep Image Prior 基于深度图像先验的非负矩阵分解方法的PET图像动态重构
2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00322
Tatsuya Yokota, Kazuya Kawai, M. Sakata, Y. Kimura, H. Hontani
{"title":"Dynamic PET Image Reconstruction Using Nonnegative Matrix Factorization Incorporated With Deep Image Prior","authors":"Tatsuya Yokota, Kazuya Kawai, M. Sakata, Y. Kimura, H. Hontani","doi":"10.1109/ICCV.2019.00322","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00322","url":null,"abstract":"We propose a method that reconstructs dynamic positron emission tomography (PET) images from given sinograms by using non-negative matrix factorization (NMF) incorporated with a deep image prior (DIP) for appropriately constraining the spatial patterns of resultant images. The proposed method can reconstruct dynamic PET images with higher signal-to-noise ratio (SNR) and blindly decompose an image matrix into pairs of spatial and temporal factors. The former represent homogeneous tissues with different kinetic parameters and the latter represent the time activity curves that are observed in the corresponding homogeneous tissues. We employ U-Nets combined in parallel for DIP and each of the U-nets is used to extract each spatial factor decomposed from the data matrix. Experimental results show that the proposed method outperforms conventional methods and can extract spatial factors that represent the homogeneous tissues.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"31 1","pages":"3126-3135"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73203630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Face Video Deblurring Using 3D Facial Priors 面部视频去模糊使用3D面部先验
2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00948
Wenqi Ren, Jiaolong Yang, Senyou Deng, D. Wipf, Xiaochun Cao, Xin Tong
{"title":"Face Video Deblurring Using 3D Facial Priors","authors":"Wenqi Ren, Jiaolong Yang, Senyou Deng, D. Wipf, Xiaochun Cao, Xin Tong","doi":"10.1109/ICCV.2019.00948","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00948","url":null,"abstract":"Existing face deblurring methods only consider single frames and do not account for facial structure and identity information. These methods struggle to deblur face videos that exhibit significant pose variations and misalignment. In this paper we propose a novel face video deblurring network capitalizing on 3D facial priors. The model consists of two main branches: i) a face video deblurring sub-network based on an encoder-decoder architecture, and ii) a 3D face reconstruction and rendering branch for predicting 3D priors of salient facial structures and identity knowledge. These structures encourage the deblurring branch to generate sharp faces with detailed structures. Our method not only uses low-level information (i.e., image intensity), but also middle-level information (i.e., 3D facial structure) and high-level knowledge (i.e., identity content) to further explore spatial constraints of facial components from blurry face frames. Extensive experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"30 1","pages":"9387-9396"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73214623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Distillation-Based Training for Multi-Exit Architectures 多出口架构的基于蒸馏的培训
2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00144
Mary Phuong, Christoph H. Lampert
{"title":"Distillation-Based Training for Multi-Exit Architectures","authors":"Mary Phuong, Christoph H. Lampert","doi":"10.1109/ICCV.2019.00144","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00144","url":null,"abstract":"Multi-exit architectures, in which a stack of processing layers is interleaved with early output layers, allow the processing of a test example to stop early and thus save computation time and/or energy. In this work, we propose a new training procedure for multi-exit architectures based on the principle of knowledge distillation. The method encourages early exits to mimic later, more accurate exits, by matching their probability outputs. Experiments on CIFAR100 and ImageNet show that distillation-based training significantly improves the accuracy of early exits while maintaining state-of-the-art accuracy for late ones. The method is particularly beneficial when training data is limited and also allows a straight-forward extension to semi-supervised learning, i.e. make use also of unlabeled data at training time. Moreover, it takes only a few lines to implement and imposes almost no computational overhead at training time, and none at all at test time.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"1355-1364"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75313299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 120
Multi-Class Part Parsing With Joint Boundary-Semantic Awareness 基于联合边界语义感知的多类零件解析
2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00927
Yifan Zhao, Jia Li, Yu Zhang, Yonghong Tian
{"title":"Multi-Class Part Parsing With Joint Boundary-Semantic Awareness","authors":"Yifan Zhao, Jia Li, Yu Zhang, Yonghong Tian","doi":"10.1109/ICCV.2019.00927","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00927","url":null,"abstract":"Object part parsing in the wild, which requires to simultaneously detect multiple object classes in the scene and accurately segments semantic parts within each class, is challenging for the joint presence of class-level and part-level ambiguities. Despite its importance, however, this problem is not sufficiently explored in existing works. In this paper, we propose a joint parsing framework with boundary and semantic awareness to address this challenging problem. To handle part-level ambiguity, a boundary awareness module is proposed to make mid-level features at multiple scales attend to part boundaries for accurate part localization, which are then fused with high-level features for effective part recognition. For class-level ambiguity, we further present a semantic awareness module that selects discriminative part features relevant to a category to prevent irrelevant features being merged together. The proposed modules are lightweight and implementation friendly, improving the performance substantially when plugged into various baseline architectures. Without bells and whistles, the full model sets new state-of-the-art results on the Pascal-Part dataset, in both multi-class and the conventional single-class setting, while running substantially faster than recent high-performance approaches.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"2 1","pages":"9176-9185"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75479109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Elaborate Monocular Point and Line SLAM With Robust Initialization 基于鲁棒初始化的精细单目点线SLAM
2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00121
Sang Jun Lee, S. Hwang
{"title":"Elaborate Monocular Point and Line SLAM With Robust Initialization","authors":"Sang Jun Lee, S. Hwang","doi":"10.1109/ICCV.2019.00121","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00121","url":null,"abstract":"This paper presents a monocular indirect SLAM system which performs robust initialization and accurate localization. For initialization, we utilize a matrix factorization-based method. Matrix factorization-based methods require that extracted feature points must be tracked in all used frames. Since consistent tracking is difficult in challenging environments, a geometric interpolation that utilizes epipolar geometry is proposed. For localization, 3D lines are utilized. We propose the use of Plu ̈cker line coordinates to represent geometric information of lines. We also propose orthonormal representation of Plu ̈cker line coordinates and Jacobians of lines for better optimization. Experimental results show that the proposed initialization generates consistent and robust map in linear time with fast convergence even in challenging scenes. And localization using proposed line representations is faster, more accurate and memory efficient than other state-of-the-art methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"56 1","pages":"1121-1129"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75098151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Align, Attend and Locate: Chest X-Ray Diagnosis via Contrast Induced Attention Network With Limited Supervision 对齐,出席和定位:在有限监督下通过造影剂引起的注意网络进行胸部x线诊断
2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.01073
Jingyun Liu, Gangming Zhao, Yu Fei, Ming Zhang, Yizhou Wang, Yizhou Yu
{"title":"Align, Attend and Locate: Chest X-Ray Diagnosis via Contrast Induced Attention Network With Limited Supervision","authors":"Jingyun Liu, Gangming Zhao, Yu Fei, Ming Zhang, Yizhou Wang, Yizhou Yu","doi":"10.1109/ICCV.2019.01073","DOIUrl":"https://doi.org/10.1109/ICCV.2019.01073","url":null,"abstract":"Obstacles facing accurate identification and localization of diseases in chest X-ray images lie in the lack of high-quality images and annotations. In this paper, we propose a Contrast Induced Attention Network (CIA-Net), which exploits the highly structured property of chest X-ray images and localizes diseases via contrastive learning on the aligned positive and negative samples. To force the attention module to focus only on sites of abnormalities, we also introduce a learnable alignment module to adjust all the input images, which eliminates variations of scales, angles, and displacements of X-ray images generated under bad scan conditions. We show that the use of contrastive attention and alignment module allows the model to learn rich identification and localization information using only a small amount of location annotations, resulting in state-of-the-art performance in NIH chest X-ray dataset.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"177 1","pages":"10631-10640"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78031119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Adaptive Density Map Generation for Crowd Counting 用于人群计数的自适应密度图生成
2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00122
Jia Wan, Antoni B. Chan
{"title":"Adaptive Density Map Generation for Crowd Counting","authors":"Jia Wan, Antoni B. Chan","doi":"10.1109/ICCV.2019.00122","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00122","url":null,"abstract":"Crowd counting is an important topic in computer vision due to its practical usage in surveillance systems. The typical design of crowd counting algorithms is divided into two steps. First, the ground-truth density maps of crowd images are generated from the ground-truth dot maps (density map generation), e.g., by convolving with a Gaussian kernel. Second, deep learning models are designed to predict a density map from an input image (density map estimation). Most research efforts have concentrated on the density map estimation problem, while the problem of density map generation has not been adequately explored. In particular, the density map could be considered as an intermediate representation used to train a crowd counting network. In the sense of end-to-end training, the hand-crafted methods used for generating the density maps may not be optimal for the particular network or dataset used. To address this issue, we first show the impact of different density maps and that better ground-truth density maps can be obtained by refining the existing ones using a learned refinement network, which is jointly trained with the counter. Then, we propose an adaptive density map generator, which takes the annotation dot map as input, and learns a density map representation for a counter. The counter and generator are trained jointly within an end-to-end framework. The experiment results on popular counting datasets confirm the effectiveness of the proposed learnable density map representations.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"167 1","pages":"1130-1139"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80492265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 132
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信