2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

筛选
英文 中文
Image retrieval using scene graphs 使用场景图的图像检索
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298990
Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Li Fei-Fei
{"title":"Image retrieval using scene graphs","authors":"Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Li Fei-Fei","doi":"10.1109/CVPR.2015.7298990","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298990","url":null,"abstract":"This paper develops a novel framework for semantic image retrieval based on the notion of a scene graph. Our scene graphs represent objects (“man”, “boat”), attributes of objects (“boat is white”) and relationships between objects (“man standing on boat”). We use these scene graphs as queries to retrieve semantically related images. To this end, we design a conditional random field model that reasons about possible groundings of scene graphs to test images. The likelihoods of these groundings are used as ranking scores for retrieval. We introduce a novel dataset of 5,000 human-generated scene graphs grounded to images and use this dataset to evaluate our method for image retrieval. In particular, we evaluate retrieval using full scene graphs and small scene subgraphs, and show that our method outperforms retrieval methods that use only objects or low-level image features. In addition, we show that our full model can be used to improve object localization compared to baseline methods.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114510242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 890
Mind's eye: A recurrent visual representation for image caption generation 心灵之眼:图像标题生成的循环视觉表示
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298856
Xinlei Chen, C. L. Zitnick
{"title":"Mind's eye: A recurrent visual representation for image caption generation","authors":"Xinlei Chen, C. L. Zitnick","doi":"10.1109/CVPR.2015.7298856","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298856","url":null,"abstract":"In this paper we explore the bi-directional mapping between images and their sentence-based descriptions. Critical to our approach is a recurrent neural network that attempts to dynamically build a visual representation of the scene as a caption is being generated or read. The representation automatically learns to remember long-term visual concepts. Our model is capable of both generating novel captions given an image, and reconstructing visual features given an image description. We evaluate our approach on several tasks. These include sentence generation, sentence retrieval and image retrieval. State-of-the-art results are shown for the task of generating novel image descriptions. When compared to human generated captions, our automatically generated captions are equal to or preferred by humans 21.0% of the time. Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128383187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 479
RGBD-fusion: Real-time high precision depth recovery RGBD-fusion:实时高精度深度恢复
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7299179
Roy Or-El, G. Rosman, Aaron Wetzler, R. Kimmel, A. Bruckstein
{"title":"RGBD-fusion: Real-time high precision depth recovery","authors":"Roy Or-El, G. Rosman, Aaron Wetzler, R. Kimmel, A. Bruckstein","doi":"10.1109/CVPR.2015.7299179","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7299179","url":null,"abstract":"The popularity of low-cost RGB-D scanners is increasing on a daily basis. Nevertheless, existing scanners often cannot capture subtle details in the environment. We present a novel method to enhance the depth map by fusing the intensity and depth information to create more detailed range profiles. The lighting model we use can handle natural scene illumination. It is integrated in a shape from shading like technique to improve the visual fidelity of the reconstructed object. Unlike previous efforts in this domain, the detailed geometry is calculated directly, without the need to explicitly find and integrate surface normals. In addition, the proposed method operates four orders of magnitude faster than the state of the art. Qualitative and quantitative visual and statistical evidence support the improvement in the depth obtained by the suggested method.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129652884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
Reliable Patch Trackers: Robust visual tracking by exploiting reliable patches 可靠的补丁跟踪器:通过利用可靠的补丁进行健壮的视觉跟踪
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298632
Yang Li, Jianke Zhu, S. Hoi
{"title":"Reliable Patch Trackers: Robust visual tracking by exploiting reliable patches","authors":"Yang Li, Jianke Zhu, S. Hoi","doi":"10.1109/CVPR.2015.7298632","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298632","url":null,"abstract":"Most modern trackers typically employ a bounding box given in the first frame to track visual objects, where their tracking results are often sensitive to the initialization. In this paper, we propose a new tracking method, Reliable Patch Trackers (RPT), which attempts to identify and exploit the reliable patches that can be tracked effectively through the whole tracking process. Specifically, we present a tracking reliability metric to measure how reliably a patch can be tracked, where a probability model is proposed to estimate the distribution of reliable patches under a sequential Monte Carlo framework. As the reliable patches distributed over the image, we exploit the motion trajectories to distinguish them from the background. Therefore, the visual object can be defined as the clustering of homo-trajectory patches, where a Hough voting-like scheme is employed to estimate the target state. Encouraging experimental results on a large set of sequences showed that the proposed approach is very effective and in comparison to the state-of-the-art trackers. The full source code of our implementation will be publicly available.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126890674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 349
Clustering of static-adaptive correspondences for deformable object tracking 可变形目标跟踪的静态自适应对应的聚类
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298895
G. Nebehay, R. Pflugfelder
{"title":"Clustering of static-adaptive correspondences for deformable object tracking","authors":"G. Nebehay, R. Pflugfelder","doi":"10.1109/CVPR.2015.7298895","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298895","url":null,"abstract":"We propose a novel method for establishing correspondences on deformable objects for single-target object tracking. The key ingredient is a dissimilarity measure between correspondences that takes into account their geometric compatibility, allowing us to separate inlier correspondences from outliers. We employ both static correspondences from the initial appearance of the object as well as adaptive correspondences from the previous frame to address the stability-plasticity dilemma. The geometric dissimilarity measure enables us to also disambiguate keypoints that are difficult to match. Based on these ideas we build a keypoint-based tracker that outputs rotated bounding boxes. We demonstrate in a rigorous empirical analysis that this tracker outperforms the state of the art on a dataset of 77 sequences.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129101357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 187
Multi-view feature engineering and learning 多视图特征工程与学习
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298945
Jingming Dong, Nikolaos Karianakis, Damek Davis, Joshua Hernandez, Jonathan Balzer, Stefano Soatto
{"title":"Multi-view feature engineering and learning","authors":"Jingming Dong, Nikolaos Karianakis, Damek Davis, Joshua Hernandez, Jonathan Balzer, Stefano Soatto","doi":"10.1109/CVPR.2015.7298945","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298945","url":null,"abstract":"We frame the problem of local representation of imaging data as the computation of minimal sufficient statistics that are invariant to nuisance variability induced by viewpoint and illumination. We show that, under very stringent conditions, these are related to “feature descriptors” commonly used in Computer Vision. Such conditions can be relaxed if multiple views of the same scene are available. We propose a sampling-based and a point-estimate based approximation of such a representation, compared empirically on image-to-(multiple)image matching, for which we introduce a multi-view wide-baseline matching benchmark, consisting of a mixture of real and synthetic objects with ground truth camera motion and dense three-dimensional geometry.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123848371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Efficient sparse-to-dense optical flow estimation using a learned basis and layers 基于学习基和层的稀疏到密集光流估计
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298607
Jonas Wulff, Michael J. Black
{"title":"Efficient sparse-to-dense optical flow estimation using a learned basis and layers","authors":"Jonas Wulff, Michael J. Black","doi":"10.1109/CVPR.2015.7298607","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298607","url":null,"abstract":"We address the elusive goal of estimating optical flow both accurately and efficiently by adopting a sparse-to-dense approach. Given a set of sparse matches, we regress to dense optical flow using a learned set of full-frame basis flow fields. We learn the principal components of natural flow fields using flow computed from four Hollywood movies. Optical flow fields are then compactly approximated as a weighted sum of the basis flow fields. Our new PCA-Flow algorithm robustly estimates these weights from sparse feature matches. The method runs in under 200ms/frame on the MPI-Sintel dataset using a single CPU and is more accurate and significantly faster than popular methods such as LDOF and Classic+NL. For some applications, however, the results are too smooth. Consequently, we develop a novel sparse layered flow method in which each layer is represented by PCA-Flow. Unlike existing layered methods, estimation is fast because it uses only sparse matches. We combine information from different layers into a dense flow field using an image-aware MRF. The resulting PCA-Layers method runs in 3.2s/frame, is significantly more accurate than PCA-Flow, and achieves state-of-the-art performance in occluded regions on MPI-Sintel.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114345243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 173
Saliency-aware geodesic video object segmentation 显著性感知的测地线视频对象分割
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298961
Wenguan Wang, Jianbing Shen, F. Porikli
{"title":"Saliency-aware geodesic video object segmentation","authors":"Wenguan Wang, Jianbing Shen, F. Porikli","doi":"10.1109/CVPR.2015.7298961","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298961","url":null,"abstract":"We introduce an unsupervised, geodesic distance based, salient video object segmentation method. Unlike traditional methods, our method incorporates saliency as prior for object via the computation of robust geodesic measurement. We consider two discriminative visual features: spatial edges and temporal motion boundaries as indicators of foreground object locations. We first generate framewise spatiotemporal saliency maps using geodesic distance from these indicators. Building on the observation that foreground areas are surrounded by the regions with high spatiotemporal edge values, geodesic distance provides an initial estimation for foreground and background. Then, high-quality saliency results are produced via the geodesic distances to background regions in the subsequent frames. Through the resulting saliency maps, we build global appearance models for foreground and background. By imposing motion continuity, we establish a dynamic location model for each frame. Finally, the spatiotemporal saliency maps, appearance models and dynamic location models are combined into an energy minimization framework to attain both spatially and temporally coherent object segmentation. Extensive quantitative and qualitative experiments on benchmark video dataset demonstrate the superiority of the proposed method over the state-of-the-art algorithms.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"398 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116330842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 474
Practical robust two-view translation estimation 实用的鲁棒双视图平移估计
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298884
Johan Fredriksson, Viktor Larsson, Carl Olsson
{"title":"Practical robust two-view translation estimation","authors":"Johan Fredriksson, Viktor Larsson, Carl Olsson","doi":"10.1109/CVPR.2015.7298884","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298884","url":null,"abstract":"Outliers pose a problem in all real structure from motion systems. Due to the use of automatic matching methods one has to expect that a (sometimes very large) portion of the detected correspondences can be incorrect. In this paper we propose a method that estimates the relative translation between two cameras and simultaneously maximizes the number of inlier correspondences. Traditionally, outlier removal tasks have been addressed using RANSAC approaches. However, these are random in nature and offer no guarantees of finding a good solution. If the amount of mismatches is large, the approach becomes costly because of the need to evaluate a large number of random samples. In contrast, our approach is based on the branch and bound methodology which guarantees that an optimal solution will be found. While most optimal methods trade speed for optimality, the proposed algorithm has competitive running times on problem sizes well beyond what is common in practice. Experiments on both real and synthetic data show that the method outperforms state-of-the-art alternatives, including RANSAC, in terms of solution quality. In addition, the approach is shown to be faster than RANSAC in settings with a large amount of outliers.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127579585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
From categories to subcategories: Large-scale image classification with partial class label refinement 从类别到子类别:部分类别标签细化的大规模图像分类
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298619
M. Ristin, Juergen Gall, M. Guillaumin, L. Gool
{"title":"From categories to subcategories: Large-scale image classification with partial class label refinement","authors":"M. Ristin, Juergen Gall, M. Guillaumin, L. Gool","doi":"10.1109/CVPR.2015.7298619","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298619","url":null,"abstract":"The number of digital images is growing extremely rapidly, and so is the need for their classification. But, as more images of pre-defined categories become available, they also become more diverse and cover finer semantic differences. Ultimately, the categories themselves need to be divided into subcategories to account for that semantic refinement. Image classification in general has improved significantly over the last few years, but it still requires a massive amount of manually annotated data. Subdividing categories into subcategories multiples the number of labels, aggravating the annotation problem. Hence, we can expect the annotations to be refined only for a subset of the already labeled data, and exploit coarser labeled data to improve classification. In this work, we investigate how coarse category labels can be used to improve the classification of subcategories. To this end, we adopt the framework of Random Forests and propose a regularized objective function that takes into account relations between categories and subcategories. Compared to approaches that disregard the extra coarse labeled data, we achieve a relative improvement in subcategory classification accuracy of up to 22% in our large-scale image classification experiments.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"894 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126357658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信