2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

筛选
英文 中文
Explainability Methods for Graph Convolutional Neural Networks 图卷积神经网络的可解释性方法
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01103
Phillip E. Pope, Soheil Kolouri, Mohammad Rostami, Charles E. Martin, Heiko Hoffmann
{"title":"Explainability Methods for Graph Convolutional Neural Networks","authors":"Phillip E. Pope, Soheil Kolouri, Mohammad Rostami, Charles E. Martin, Heiko Hoffmann","doi":"10.1109/CVPR.2019.01103","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01103","url":null,"abstract":"With the growing use of graph convolutional neural networks (GCNNs) comes the need for explainability. In this paper, we introduce explainability methods for GCNNs. We develop the graph analogues of three prominent explainability methods for convolutional neural networks: contrastive gradient-based (CG) saliency maps, Class Activation Mapping (CAM), and Excitation Back-Propagation (EB) and their variants, gradient-weighted CAM (Grad-CAM) and contrastive EB (c-EB). We show a proof-of-concept of these methods on classification problems in two application domains: visual scene graphs and molecular graphs. To compare the methods, we identify three desirable properties of explanations: (1) their importance to classification, as measured by the impact of occlusions, (2) their contrastivity with respect to different classes, and (3) their sparseness on a graph. We call the corresponding quantitative metrics fidelity, contrastivity, and sparsity and evaluate them for each method. Lastly, we analyze the salient subgraphs obtained from explanations and report frequently occurring patterns.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"10764-10773"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83481975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 326
Residual Regression With Semantic Prior for Crowd Counting 基于语义先验的残差回归人群计数
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00416
Jia Wan, Wenhan Luo, Baoyuan Wu, Antoni B. Chan, Wei Liu
{"title":"Residual Regression With Semantic Prior for Crowd Counting","authors":"Jia Wan, Wenhan Luo, Baoyuan Wu, Antoni B. Chan, Wei Liu","doi":"10.1109/CVPR.2019.00416","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00416","url":null,"abstract":"Crowd counting is a challenging task due to factors such as large variations in crowdedness and severe occlusions. Although recent deep learning based counting algorithms have achieved a great progress, the correlation knowledge among samples and the semantic prior have not yet been fully exploited. In this paper, a residual regression framework is proposed for crowd counting utilizing the correlation information among samples. By incorporating such information into our network, we discover that more intrinsic characteristics can be learned by the network which thus generalizes better to unseen scenarios. Besides, we show how to effectively leverage the semantic prior to improve the performance of crowd counting. We also observe that the adversarial loss can be used to improve the quality of predicted density maps, thus leading to an improvement in crowd counting. Experiments on public datasets demonstrate the effectiveness and generalization ability of the proposed method.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"27 1","pages":"4031-4040"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87297270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 100
Monocular Depth Estimation Using Relative Depth Maps 使用相对深度图的单目深度估计
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00996
Jae-Han Lee, Chang-Su Kim
{"title":"Monocular Depth Estimation Using Relative Depth Maps","authors":"Jae-Han Lee, Chang-Su Kim","doi":"10.1109/CVPR.2019.00996","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00996","url":null,"abstract":"We propose a novel algorithm for monocular depth estimation using relative depth maps. First, using a convolutional neural network, we estimate relative depths between pairs of regions, as well as ordinary depths, at various scales. Second, we restore relative depth maps from selectively estimated data based on the rank-1 property of pairwise comparison matrices. Third, we decompose ordinary and relative depth maps into components and recombine them optimally to reconstruct a final depth map. Experimental results show that the proposed algorithm provides the state-of-art depth estimation performance.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"5 1","pages":"9721-9730"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87736584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 105
Deeply-Supervised Knowledge Synergy 深度监督知识协同
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00716
Dawei Sun, Anbang Yao, Aojun Zhou, Hao Zhao
{"title":"Deeply-Supervised Knowledge Synergy","authors":"Dawei Sun, Anbang Yao, Aojun Zhou, Hao Zhao","doi":"10.1109/CVPR.2019.00716","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00716","url":null,"abstract":"Convolutional Neural Networks (CNNs) have become deeper and more complicated compared with the pioneering AlexNet. However, current prevailing training scheme follows the previous way of adding supervision to the last layer of the network only and propagating error information up layer-by-layer. In this paper, we propose Deeply-supervised Knowledge Synergy (DKS), a new method aiming to train CNNs with improved generalization ability for image classification tasks without introducing extra computational cost during inference. Inspired by the deeply-supervised learning scheme, we first append auxiliary supervision branches on top of certain intermediate network layers. While properly using auxiliary supervision can improve model accuracy to some degree, we go one step further to explore the possibility of utilizing the probabilistic knowledge dynamically learnt by the classifiers connected to the backbone network as a new regularization to improve the training. A novel synergy loss, which considers pairwise knowledge matching among all supervision branches, is presented. Intriguingly, it enables dense pairwise knowledge matching operations in both top-down and bottom-up directions at each training iteration, resembling a dynamic synergy process for the same task. We evaluate DKS on image classification datasets using state-of-the-art CNN architectures, and show that the models trained with it are consistently better than the corresponding counterparts. For instance, on the ImageNet classification benchmark, our ResNet-152 model outperforms the baseline model with a 1.47% margin in Top-1 accuracy. Code is available at https://github.com/sundw2014/DKS.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"61 1","pages":"6990-6999"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90591628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Predicting Visible Image Differences Under Varying Display Brightness and Viewing Distance 在不同显示亮度和观看距离下预测可见图像差异
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00558
Nanyang Ye, Krzysztof Wolski, Rafał K. Mantiuk
{"title":"Predicting Visible Image Differences Under Varying Display Brightness and Viewing Distance","authors":"Nanyang Ye, Krzysztof Wolski, Rafał K. Mantiuk","doi":"10.1109/CVPR.2019.00558","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00558","url":null,"abstract":"Numerous applications require a robust metric that can predict whether image differences are visible or not. However, the accuracy of existing white-box visibility metrics, such as HDR-VDP, is often not good enough. CNN-based black-box visibility metrics have proven to be more accurate, but they cannot account for differences in viewing conditions, such as display brightness and viewing distance. In this paper, we propose a CNN-based visibility metric, which maintains the accuracy of deep network solutions and accounts for viewing conditions. To achieve this, we extend the existing dataset of locally visible differences (LocVis) with a new set of measurements, collected considering aforementioned viewing conditions. Then, we develop a hybrid model that combines white-box processing stages for modeling the effects of luminance masking and contrast sensitivity, with a black-box deep neural network. We demonstrate that the novel hybrid model can handle the change of viewing conditions correctly and outperforms state-of-the-art metrics.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"53 1","pages":"5429-5437"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85621061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Learning Words by Drawing Images 通过画图来学习单词
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00213
Dídac Surís, Adrià Recasens, David Bau, David F. Harwath, James R. Glass, A. Torralba
{"title":"Learning Words by Drawing Images","authors":"Dídac Surís, Adrià Recasens, David Bau, David F. Harwath, James R. Glass, A. Torralba","doi":"10.1109/CVPR.2019.00213","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00213","url":null,"abstract":"We propose a framework for learning through drawing. Our goal is to learn the correspondence between spoken words and abstract visual attributes, from a dataset of spoken descriptions of images. Building upon recent findings that GAN representations can be manipulated to edit semantic concepts in the generated output, we propose a new method to use such GAN-generated images to train a model using a triplet loss. To apply the method, we develop Audio CLEVRGAN, a new dataset of audio descriptions of GAN-generated CLEVR images, and we describe a training procedure that creates a curriculum of GAN-generated images that focuses training on image pairs that differ in a specific, informative way. Training is done without additional supervision beyond the spoken captions and the GAN. We find that training that takes advantage of GAN-generated edited examples results in improvements in the model's ability to learn attributes compared to previous results. Our proposed learning framework also results in models that can associate spoken words with some abstract visual concepts such as color and size.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"213 1 1","pages":"2029-2038"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85642682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
BAD SLAM: Bundle Adjusted Direct RGB-D SLAM 坏SLAM:束调整直接RGB-D SLAM
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00022
Thomas Schöps, Torsten Sattler, M. Pollefeys
{"title":"BAD SLAM: Bundle Adjusted Direct RGB-D SLAM","authors":"Thomas Schöps, Torsten Sattler, M. Pollefeys","doi":"10.1109/CVPR.2019.00022","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00022","url":null,"abstract":"A key component of Simultaneous Localization and Mapping (SLAM) systems is the joint optimization of the estimated 3D map and camera trajectory. Bundle adjustment (BA) is the gold standard for this. Due to the large number of variables in dense RGB-D SLAM, previous work has focused on approximating BA. In contrast, in this paper we present a novel, fast direct BA formulation which we implement in a real-time dense RGB-D SLAM algorithm. In addition, we show that direct RGB-D SLAM systems are highly sensitive to rolling shutter, RGB and depth sensor synchronization, and calibration errors. In order to facilitate state-of-the-art research on direct RGB-D SLAM, we propose a novel, well-calibrated benchmark for this task that uses synchronized global shutter RGB and depth cameras. It includes a training set, a test set without public ground truth, and an online evaluation service. We observe that the ranking of methods changes on this dataset compared to existing ones, and our proposed algorithm outperforms all other evaluated SLAM methods. Our benchmark and our open source SLAM algorithm are available at: www.eth3d.net","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"23 1","pages":"134-144"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85972063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 174
An Alternative Deep Feature Approach to Line Level Keyword Spotting 一种行级关键字定位的深度特征替代方法
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01294
George Retsinas, G. Louloudis, N. Stamatopoulos, Giorgos Sfikas, B. Gatos
{"title":"An Alternative Deep Feature Approach to Line Level Keyword Spotting","authors":"George Retsinas, G. Louloudis, N. Stamatopoulos, Giorgos Sfikas, B. Gatos","doi":"10.1109/CVPR.2019.01294","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01294","url":null,"abstract":"Keyword spotting (KWS) is defined as the problem of detecting all instances of a given word, provided by the user either as a query word image (Query-by-Example, QbE) or a query word string (Query-by-String, QbS) in a body of digitized documents. Keyword detection is typically preceded by a preprocessing step where the text is segmented into text lines (line-level KWS). Methods following this paradigm are monopolized by test-time computationally expensive handwritten text recognition (HTR)-based approaches; furthermore, they typically cannot handle image queries (QbE). In this work, we propose a time and storage-efficient, deep feature-based approach that enables both the image and textual search options. Three distinct components, all modeled as neural networks, are combined: normalization, feature extraction and representation of image and textual input into a common space. These components, even if designed on word level image representations, collaborate in order to achieve an efficient line level keyword spotting system. The experimental results indicate that the proposed system is on par with state-of-the-art KWS methods.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"12650-12658"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90595454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning 基于多尺度时空推理的视频社会关系识别
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00368
Xinchen Liu, Wu Liu, Meng Zhang, Jingwen Chen, Lianli Gao, C. Yan, Tao Mei
{"title":"Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning","authors":"Xinchen Liu, Wu Liu, Meng Zhang, Jingwen Chen, Lianli Gao, C. Yan, Tao Mei","doi":"10.1109/CVPR.2019.00368","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00368","url":null,"abstract":"Discovering social relations, e.g., kinship, friendship, etc., from visual contents can make machines better interpret the behaviors and emotions of human beings. Existing studies mainly focus on recognizing social relations from still images while neglecting another important media--video. On one hand, the actions and storylines in videos provide more important cues for social relation recognition. On the other hand, the key persons may appear at arbitrary spatial-temporal locations, even not in one same image from beginning to the end. To overcome these challenges, we propose a Multi-scale Spatial-Temporal Reasoning (MSTR) framework to recognize social relations from videos. For the spatial representation, we not only adopt a temporal segment network to learn global action and scene information, but also design a Triple Graphs model to capture visual relations between persons and objects. For the temporal domain, we propose a Pyramid Graph Convolutional Network to perform temporal reasoning with multi-scale receptive fields, which can obtain both long-term and short-term storylines in videos. By this means, MSTR can comprehensively explore the multi-scale actions and storylines in spatial-temporal dimensions for social relation reasoning in videos. Extensive experiments on a new large-scale Video Social Relation dataset demonstrate the effectiveness of the proposed framework.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"104 1","pages":"3561-3569"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87486283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth 联合优化空间嵌入和聚类带宽的实例分割
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00904
D. Neven, Bert De Brabandere, M. Proesmans, L. Gool
{"title":"Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth","authors":"D. Neven, Bert De Brabandere, M. Proesmans, L. Gool","doi":"10.1109/CVPR.2019.00904","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00904","url":null,"abstract":"Current state-of-the-art instance segmentation methods are not suited for real-time applications like autonomous driving, which require fast execution times at high accuracy. Although the currently dominant proposal-based methods have high accuracy, they are slow and generate masks at a fixed and low resolution. Proposal-free methods, by contrast, can generate masks at high resolution and are often faster, but fail to reach the same accuracy as the proposal-based methods. In this work we propose a new clustering loss function for proposal-free instance segmentation. The loss function pulls the spatial embeddings of pixels belonging to the same instance together and jointly learns an instance-specific clustering bandwidth, maximizing the intersection-over-union of the resulting instance mask. When combined with a fast architecture, the network can perform instance segmentation in real-time while maintaining a high accuracy. We evaluate our method on the challenging Cityscapes benchmark and achieve top results (5% improvement over Mask R-CNN) at more than 10 fps on 2MP images.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"8829-8837"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87766096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 215
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信