2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献

筛选
英文 中文
Learning Answer Embeddings for Visual Question Answering 学习视觉问题回答的答案嵌入
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00569
Hexiang Hu, Wei-Lun Chao, Fei Sha
{"title":"Learning Answer Embeddings for Visual Question Answering","authors":"Hexiang Hu, Wei-Lun Chao, Fei Sha","doi":"10.1109/CVPR.2018.00569","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00569","url":null,"abstract":"We propose a novel probabilistic model for visual question answering (Visual QA). The key idea is to infer two sets of embeddings: one for the image and the question jointly and the other for the answers. The learning objective is to learn the best parameterization of those embeddings such that the correct answer has higher likelihood among all possible answers. In contrast to several existing approaches of treating Visual QA as multi-way classification, the proposed approach takes the semantic relationships (as characterized by the embeddings) among answers into consideration, instead of viewing them as independent ordinal numbers. Thus, the learned embedded function can be used to embed unseen answers (in the training dataset). These properties make the approach particularly appealing for transfer learning for open-ended Visual QA, where the source dataset on which the model is learned has limited overlapping with the target dataset in the space of answers. We have also developed large-scale optimization techniques for applying the model to datasets with a large number of answers, where the challenge is to properly normalize the proposed probabilistic models. We validate our approach on several Visual QA datasets and investigate its utility for transferring models across datasets. The empirical results have shown that the approach performs well not only on in-domain learning but also on transfer learning.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88400297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
PoTion: Pose MoTion Representation for Action Recognition 药水:动作识别的姿势动作表现
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00734
Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, C. Schmid
{"title":"PoTion: Pose MoTion Representation for Action Recognition","authors":"Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, C. Schmid","doi":"10.1109/CVPR.2018.00734","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00734","url":null,"abstract":"Most state-of-the-art methods for action recognition rely on a two-stream architecture that processes appearance and motion independently. In this paper, we claim that considering them jointly offers rich information for action recognition. We introduce a novel representation that gracefully encodes the movement of some semantic keypoints. We use the human joints as these keypoints and term our Pose moTion representation PoTion. Specifically, we first run a state-of-the-art human pose estimator [4] and extract heatmaps for the human joints in each frame. We obtain our PoTion representation by temporally aggregating these probability maps. This is achieved by 'colorizing' each of them depending on the relative time of the frames in the video clip and summing them. This fixed-size representation for an entire video clip is suitable to classify actions using a shallow convolutional neural network. Our experimental evaluation shows that PoTion outperforms other state-of-the-art pose representations [6, 48]. Furthermore, it is complementary to standard appearance and motion streams. When combining PoTion with the recent two-stream I3D approach [5], we obtain state-of-the-art performance on the JHMDB, HMDB and UCF101 datasets.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82864839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 242
Fast and Robust Estimation for Unit-Norm Constrained Linear Fitting Problems 单位范数约束线性拟合问题的快速鲁棒估计
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00850
Daiki Ikami, T. Yamasaki, K. Aizawa
{"title":"Fast and Robust Estimation for Unit-Norm Constrained Linear Fitting Problems","authors":"Daiki Ikami, T. Yamasaki, K. Aizawa","doi":"10.1109/CVPR.2018.00850","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00850","url":null,"abstract":"M-estimator using iteratively reweighted least squares (IRLS) is one of the best-known methods for robust estimation. However, IRLS is ineffective for robust unit-norm constrained linear fitting (UCLF) problems, such as fundamental matrix estimation because of a poor initial solution. We overcome this problem by developing a novel objective function and its optimization, named iteratively reweighted eigenvalues minimization (IREM). IREM is guaranteed to decrease the objective function and achieves fast convergence and high robustness. In robust fundamental matrix estimation, IREM performs approximately 5-500 times faster than random sampling consensus (RANSAC) while preserving comparable or superior robustness.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82842822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
3D Object Detection with Latent Support Surfaces 具有潜在支持面的3D对象检测
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00104
Zhile Ren, Erik B. Sudderth
{"title":"3D Object Detection with Latent Support Surfaces","authors":"Zhile Ren, Erik B. Sudderth","doi":"10.1109/CVPR.2018.00104","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00104","url":null,"abstract":"We develop a 3D object detection algorithm that uses latent support surfaces to capture contextual relationships in indoor scenes. Existing 3D representations for RGB-D images capture the local shape and appearance of object categories, but have limited power to represent objects with different visual styles. The detection of small objects is also challenging because the search space is very large in 3D scenes. However, we observe that much of the shape variation within 3D object categories can be explained by the location of a latent support surface, and smaller objects are often supported by larger objects. Therefore, we explicitly use latent support surfaces to better represent the 3D appearance of large objects, and provide contextual cues to improve the detection of small objects. We evaluate our model with 19 object categories from the SUN RGB-D database, and demonstrate state-of-the-art performance.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88823195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A Deeper Look at Power Normalizations 更深入地了解权力正常化
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00605
Piotr Koniusz, Hongguang Zhang, F. Porikli
{"title":"A Deeper Look at Power Normalizations","authors":"Piotr Koniusz, Hongguang Zhang, F. Porikli","doi":"10.1109/CVPR.2018.00605","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00605","url":null,"abstract":"Power Normalizations (PN) are very useful non-linear operators in the context of Bag-of-Words data representations as they tackle problems such as feature imbalance. In this paper, we reconsider these operators in the deep learning setup by introducing a novel layer that implements PN for non-linear pooling of feature maps. Specifically, by using a kernel formulation, our layer combines the feature vectors and their respective spatial locations in the feature maps produced by the last convolutional layer of CNN. Linearization of such a kernel results in a positive definite matrix capturing the second-order statistics of the feature vectors, to which PN operators are applied. We study two types of PN functions, namely (i) MaxExp and (ii) Gamma, addressing their role and meaning in the context of nonlinear pooling. We also provide a probabilistic interpretation of these operators and derive their surrogates with well-behaved gradients for end-to-end CNN learning. We apply our theory to practice by implementing the PN layer on a ResNet-50 model and showcase experiments on four benchmarks for fine-grained recognition, scene recognition, and material classification. Our results demonstrate state-of-the-part performance across all these tasks.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88869228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Inverse Composition Discriminative Optimization for Point Cloud Registration 点云配准的逆组合判别优化
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00316
J. Vongkulbhisal, Beñat Irastorza Ugalde, F. D. L. Torre, J. Costeira
{"title":"Inverse Composition Discriminative Optimization for Point Cloud Registration","authors":"J. Vongkulbhisal, Beñat Irastorza Ugalde, F. D. L. Torre, J. Costeira","doi":"10.1109/CVPR.2018.00316","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00316","url":null,"abstract":"Rigid Point Cloud Registration (PCReg) refers to the problem of finding the rigid transformation between two sets of point clouds. This problem is particularly important due to the advances in new 3D sensing hardware, and it is challenging because neither the correspondence nor the transformation parameters are known. Traditional local PCReg methods (e.g., ICP) rely on local optimization algorithms, which can get trapped in bad local minima in the presence of noise, outliers, bad initializations, etc. To alleviate these issues, this paper proposes Inverse Composition Discriminative Optimization (ICDO), an extension of Discriminative Optimization (DO), which learns a sequence of update steps from synthetic training data that search the parameter space for an improved solution. Unlike DO, ICDO is object-independent and generalizes even to unseen shapes. We evaluated ICDO on both synthetic and real data, and show that ICDO can match the speed and outperform the accuracy of state-of-the-art PCReg algorithms.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84630422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Soccer on Your Tabletop 桌上足球
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00498
Konstantinos Rematas, Ira Kemelmacher-Shlizerman, B. Curless, S. Seitz
{"title":"Soccer on Your Tabletop","authors":"Konstantinos Rematas, Ira Kemelmacher-Shlizerman, B. Curless, S. Seitz","doi":"10.1109/CVPR.2018.00498","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00498","url":null,"abstract":"We present a system that transforms a monocular video of a soccer game into a moving 3D reconstruction, in which the players and field can be rendered interactively with a 3D viewer or through an Augmented Reality device. At the heart of our paper is an approach to estimate the depth map of each player, using a CNN that is trained on 3D player data extracted from soccer video games. We compare with state of the art body pose and depth estimation techniques, and show results on both synthetic ground truth benchmarks, and real YouTube soccer footage.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84641375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Depth and Transient Imaging with Compressive SPAD Array Cameras 压缩SPAD阵列相机的深度和瞬态成像
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00036
Qilin Sun, Xiong Dun, Yifan Peng, W. Heidrich
{"title":"Depth and Transient Imaging with Compressive SPAD Array Cameras","authors":"Qilin Sun, Xiong Dun, Yifan Peng, W. Heidrich","doi":"10.1109/CVPR.2018.00036","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00036","url":null,"abstract":"Time-of-flight depth imaging and transient imaging are two imaging modalities that have recently received a lot of interest. Despite much research, existing hardware systems are limited either in terms of temporal resolution or are prohibitively expensive. Arrays of Single Photon Avalanche Diodes (SPADs) promise to fill this gap by providing higher temporal resolution at an affordable cost. Unfortunately SPAD arrays are to date only available in relatively small resolutions. In this work we aim to overcome the spatial resolution limit of SPAD arrays by employing a compressive sensing camera design. Using a DMD and custom optics, we achieve an image resolution of up to 800×400 on SPAD Arrays of resolution 64×32. Using our new data fitting model for the time histograms, we suppress the noise while abstracting the phase and amplitude information, so as to realize a temporal resolution of a few tens of picoseconds.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88608364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Tensorize, Factorize and Regularize: Robust Visual Relationship Learning 张化、因式分解和正则化:稳健的视觉关系学习
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00112
Seong Jae Hwang, Sathya Ravi, Zirui Tao, Hyunwoo J. Kim, Maxwell D. Collins, Vikas Singh
{"title":"Tensorize, Factorize and Regularize: Robust Visual Relationship Learning","authors":"Seong Jae Hwang, Sathya Ravi, Zirui Tao, Hyunwoo J. Kim, Maxwell D. Collins, Vikas Singh","doi":"10.1109/CVPR.2018.00112","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00112","url":null,"abstract":"Visual relationships provide higher-level information of objects and their relations in an image - this enables a semantic understanding of the scene and helps downstream applications. Given a set of localized objects in some training data, visual relationship detection seeks to detect the most likely \"relationship\" between objects in a given image. While the specific objects may be well represented in training data, their relationships may still be infrequent. The empirical distribution obtained from seeing these relationships in a dataset does not model the underlying distribution well - a serious issue for most learning methods. In this work, we start from a simple multi-relational learning model, which in principle, offers a rich formalization for deriving a strong prior for learning visual relationships. While the inference problem for deriving the regularizer is challenging, our main technical contribution is to show how adapting recent results in numerical linear algebra lead to efficient algorithms for a factorization scheme that yields highly informative priors. The factorization provides sample size bounds for inference (under mild conditions) for the underlying [object, predicate, object] relationship learning task on its own and surprisingly outperforms (in some cases) existing methods even without utilizing visual features. Then, when integrated with an end-to-end architecture for visual relationship detection leveraging image data, we substantially improve the state-of-the-art.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88636899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction 基于深度神经网络的人群交互编码行人轨迹预测
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00553
Yanyu Xu, Zhixin Piao, Shenghua Gao
{"title":"Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction","authors":"Yanyu Xu, Zhixin Piao, Shenghua Gao","doi":"10.1109/CVPR.2018.00553","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00553","url":null,"abstract":"Pedestrian trajectory prediction is a challenging task because of the complex nature of humans. In this paper, we tackle the problem within a deep learning framework by considering motion information of each pedestrian and its interaction with the crowd. Specifically, motivated by the residual learning in deep learning, we propose to predict displacement between neighboring frames for each pedestrian sequentially. To predict such displacement, we design a crowd interaction deep neural network (CIDNN) which considers the different importance of different pedestrians for the displacement prediction of a target pedestrian. Specifically, we use an LSTM to model motion information for all pedestrians and use a multi-layer perceptron to map the location of each pedestrian to a high dimensional feature space where the inner product between features is used as a measurement for the spatial affinity between two pedestrians. Then we weight the motion features of all pedestrians based on their spatial affinity to the target pedestrian for location displacement prediction. Extensive experiments on publicly available datasets validate the effectiveness of our method for trajectory prediction.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89367627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 207
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信