2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

筛选
英文 中文
Weakly Supervised Actor-Action Segmentation via Robust Multi-task Ranking 基于鲁棒多任务排序的弱监督动作分割
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.115
Yan Yan, Chenliang Xu, Dawen Cai, Jason J. Corso
{"title":"Weakly Supervised Actor-Action Segmentation via Robust Multi-task Ranking","authors":"Yan Yan, Chenliang Xu, Dawen Cai, Jason J. Corso","doi":"10.1109/CVPR.2017.115","DOIUrl":"https://doi.org/10.1109/CVPR.2017.115","url":null,"abstract":"Fine-grained activity understanding in videos has attracted considerable recent attention with a shift from action classification to detailed actor and action understanding that provides compelling results for perceptual needs of cutting-edge autonomous systems. However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions. To address these issues, in this paper, we propose a novel, robust multi-task ranking model for weakly supervised actor-action segmentation where only video-level tags are given for training samples. Our model is able to share useful information among different actors and actions while learning a ranking matrix to select representative supervoxels for actors and actions respectively. Final segmentation results are generated by a conditional random field that considers various ranking scores for different video parts. Extensive experimental results on the Actor-Action Dataset (A2D) demonstrate that the proposed approach outperforms the state-of-the-art weakly supervised methods and performs as well as the top-performing fully supervised method.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"25 1","pages":"1022-1031"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74149914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Surface Motion Capture Transfer with Gaussian Process Regression 表面运动捕捉传输与高斯过程回归
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.379
A. Boukhayma, Jean-Sébastien Franco, Edmond Boyer
{"title":"Surface Motion Capture Transfer with Gaussian Process Regression","authors":"A. Boukhayma, Jean-Sébastien Franco, Edmond Boyer","doi":"10.1109/CVPR.2017.379","DOIUrl":"https://doi.org/10.1109/CVPR.2017.379","url":null,"abstract":"We address the problem of transferring motion between captured 4D models. We particularly focus on human subjects for which the ability to automatically augment 4D datasets, by propagating movements between subjects, is of interest in a great deal of recent vision applications that builds on human visual corpus. Given 4D training sets for two subjects for which a sparse set of corresponding keyposes are known, our method is able to transfer a newly captured motion from one subject to the other. With the aim to generalize transfers to input motions possibly very diverse with respect to the training sets, the method contributes with a new transfer model based on non-linear pose interpolation. Building on Gaussian process regression, this model intends to capture and preserve individual motion properties, and thereby realism, by accounting for pose inter-dependencies during motion transfers. Our experiments show visually qualitative, and quantitative, improvements over existing pose-mapping methods and confirm the generalization capabilities of our method compared to state of the art.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"81 1","pages":"3558-3566"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82861896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Latent Multi-view Subspace Clustering 潜在多视图子空间聚类
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.461
Changqing Zhang, Q. Hu, H. Fu, Peng Fei Zhu, Xiaochun Cao
{"title":"Latent Multi-view Subspace Clustering","authors":"Changqing Zhang, Q. Hu, H. Fu, Peng Fei Zhu, Xiaochun Cao","doi":"10.1109/CVPR.2017.461","DOIUrl":"https://doi.org/10.1109/CVPR.2017.461","url":null,"abstract":"In this paper, we propose a novel Latent Multi-view Subspace Clustering (LMSC) method, which clusters data points with latent representation and simultaneously explores underlying complementary information from multiple views. Unlike most existing single view subspace clustering methods that reconstruct data points using original features, our method seeks the underlying latent representation and simultaneously performs data reconstruction based on the learned latent representation. With the complementarity of multiple views, the latent representation could depict data themselves more comprehensively than each single view individually, accordingly makes subspace representation more accurate and robust as well. The proposed method is intuitive and can be optimized efficiently by using the Augmented Lagrangian Multiplier with Alternating Direction Minimization (ALM-ADM) algorithm. Extensive experiments on benchmark datasets have validated the effectiveness of our proposed method.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"11 1","pages":"4333-4341"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88394593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 314
Direct Photometric Alignment by Mesh Deformation 通过网格变形直接光度对准
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.289
Kaimo Lin, Nianjuan Jiang, Shuaicheng Liu, L. Cheong, M. Do, Jiangbo Lu
{"title":"Direct Photometric Alignment by Mesh Deformation","authors":"Kaimo Lin, Nianjuan Jiang, Shuaicheng Liu, L. Cheong, M. Do, Jiangbo Lu","doi":"10.1109/CVPR.2017.289","DOIUrl":"https://doi.org/10.1109/CVPR.2017.289","url":null,"abstract":"The choice of motion models is vital in applications like image/video stitching and video stabilization. Conventional methods explored different approaches ranging from simple global parametric models to complex per-pixel optical flow. Mesh-based warping methods achieve a good balance between computational complexity and model flexibility. However, they typically require high quality feature correspondences and suffer from mismatches and low-textured image content. In this paper, we propose a mesh-based photometric alignment method that minimizes pixel intensity difference instead of Euclidean distance of known feature correspondences. The proposed method combines the superior performance of dense photometric alignment with the efficiency of mesh-based image warping. It achieves better global alignment quality than the feature-based counterpart in textured images, and more importantly, it is also robust to low-textured image content. Abundant experiments show that our method can handle a variety of images and videos, and outperforms representative state-of-the-art methods in both image stitching and video stabilization tasks.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"19 1","pages":"2701-2709"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90813396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network MDNet:一个语义和视觉可解释的医学图像诊断网络
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-08 DOI: 10.1109/CVPR.2017.378
Zizhao Zhang, Yuanpu Xie, F. Xing, M. McGough, L. Yang
{"title":"MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network","authors":"Zizhao Zhang, Yuanpu Xie, F. Xing, M. McGough, L. Yang","doi":"10.1109/CVPR.2017.378","DOIUrl":"https://doi.org/10.1109/CVPR.2017.378","url":null,"abstract":"The inability to interpret the model prediction in semantically and visually meaningful ways is a well-known shortcoming of most existing computer-aided diagnosis methods. In this paper, we propose MDNet to establish a direct multimodal mapping between medical images and diagnostic reports that can read images, generate diagnostic reports, retrieve images by symptom descriptions, and visualize attention, to provide justifications of the network diagnosis process. MDNet includes an image model and a language model. The image model is proposed to enhance multi-scale feature ensembles and utilization efficiency. The language model, integrated with our improved attention mechanism, aims to read and explore discriminative image feature descriptions from reports to learn a direct mapping from sentence words to image pixels. The overall network is trained end-to-end by using our developed optimization strategy. Based on a pathology bladder cancer images and its diagnostic reports (BCIDR) dataset, we conduct sufficient experiments to demonstrate that MDNet outperforms comparative baselines. The proposed image model obtains state-of-the-art performance on two CIFAR datasets as well.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"47 1","pages":"3549-3557"},"PeriodicalIF":0.0,"publicationDate":"2017-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78418531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 261
RON: Reverse Connection with Objectness Prior Networks for Object Detection 罗恩:反向连接与对象先验网络的对象检测
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-06 DOI: 10.1109/CVPR.2017.557
Tao Kong, F. Sun, Anbang Yao, Huaping Liu, Ming Lu, Yurong Chen
{"title":"RON: Reverse Connection with Objectness Prior Networks for Object Detection","authors":"Tao Kong, F. Sun, Anbang Yao, Huaping Liu, Ming Lu, Yurong Chen","doi":"10.1109/CVPR.2017.557","DOIUrl":"https://doi.org/10.1109/CVPR.2017.557","url":null,"abstract":"We present RON, an efficient and effective framework for generic object detection. Our motivation is to smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies. Under fully convolutional architecture, RON mainly focuses on two fundamental problems: (a) multi-scale object localization and (b) negative sample mining. To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs. To deal with (b), we propose the objectness prior to significantly reduce the searching space of objects. We optimize the reverse connection, objectness prior and object detector jointly by a multi-task loss function, thus RON can directly predict final detection results from all locations of various feature maps. Extensive experiments on the challenging PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO benchmarks demonstrate the competitive performance of RON. Specifically, with VGG-16 and low resolution 384×384 input size, the network gets 81.3% mAP on PASCAL VOC 2007, 80.7% mAP on PASCAL VOC 2012 datasets. Its superiority increases when datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. With 1.5G GPU memory at test phase, the speed of the network is 15 FPS, 3 times faster than the Faster R-CNN counterpart. Code will be made publicly available.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"8 1","pages":"5244-5252"},"PeriodicalIF":0.0,"publicationDate":"2017-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82331932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 384
Benchmarking Denoising Algorithms with Real Photographs 真实照片的基准去噪算法
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-05 DOI: 10.1109/CVPR.2017.294
Tobias Plötz, S. Roth
{"title":"Benchmarking Denoising Algorithms with Real Photographs","authors":"Tobias Plötz, S. Roth","doi":"10.1109/CVPR.2017.294","DOIUrl":"https://doi.org/10.1109/CVPR.2017.294","url":null,"abstract":"Lacking realistic ground truth data, image denoising techniques are traditionally evaluated on images corrupted by synthesized i.i.d. Gaussian noise. We aim to obviate this unrealistic setting by developing a methodology for benchmarking denoising techniques on real photographs. We capture pairs of images with different ISO values and appropriately adjusted exposure times, where the nearly noise-free low-ISO image serves as reference. To derive the ground truth, careful post-processing is needed. We correct spatial misalignment, cope with inaccuracies in the exposure parameters through a linear intensity transform based on a novel heteroscedastic Tobit regression model, and remove residual low-frequency bias that stems, e.g., from minor illumination changes. We then capture a novel benchmark dataset, the Darmstadt Noise Dataset (DND), with consumer cameras of differing sensor sizes. One interesting finding is that various recent techniques that perform well on synthetic noise are clearly outperformed by BM3D on photographs with real noise. Our benchmark delineates realistic evaluation scenarios that deviate strongly from those commonly used in the scientific literature.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"57 1","pages":"2750-2759"},"PeriodicalIF":0.0,"publicationDate":"2017-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72678157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 466
Discover and Learn New Objects from Documentaries 从纪录片中发现和学习新的对象
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-01 DOI: 10.1109/CVPR.2017.124
Kai Chen, Hang Song, Chen Change Loy, Dahua Lin
{"title":"Discover and Learn New Objects from Documentaries","authors":"Kai Chen, Hang Song, Chen Change Loy, Dahua Lin","doi":"10.1109/CVPR.2017.124","DOIUrl":"https://doi.org/10.1109/CVPR.2017.124","url":null,"abstract":"Despite the remarkable progress in recent years, detecting objects in a new context remains a challenging task. Detectors learned from a public dataset can only work with a fixed list of categories, while training from scratch usually requires a large amount of training data with detailed annotations. This work aims to explore a novel approach – learning object detectors from documentary films in a weakly supervised manner. This is inspired by the observation that documentaries often provide dedicated exposition of certain object categories, where visual presentations are aligned with subtitles. We believe that object detectors can be learned from such a rich source of information. Towards this goal, we develop a joint probabilistic framework, where individual pieces of information, including video frames and subtitles, are brought together via both visual and linguistic links. On top of this formulation, we further derive a weakly supervised learning algorithm, where object model learning and training set mining are unified in an optimization procedure. Experimental results on a real world dataset demonstrate that this is an effective approach to learning new object detectors.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"23 1","pages":"1111-1120"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73797775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization 学习对齐语义分割和2.5D地图用于地理定位
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-01 DOI: 10.1109/CVPR.2017.488
Anil Armagan, Martin Hirzer, P. Roth, V. Lepetit
{"title":"Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization","authors":"Anil Armagan, Martin Hirzer, P. Roth, V. Lepetit","doi":"10.1109/CVPR.2017.488","DOIUrl":"https://doi.org/10.1109/CVPR.2017.488","url":null,"abstract":"We present an efficient method for geolocalization in urban environments starting from a coarse estimate of the location provided by a GPS and using a simple untextured 2.5D model of the surrounding buildings. Our key contribution is a novel efficient and robust method to optimize the pose: We train a Deep Network to predict the best direction to improve a pose estimate, given a semantic segmentation of the input image and a rendering of the buildings from this estimate. We then iteratively apply this CNN until converging to a good pose. This approach avoids the use of reference images of the surroundings, which are difficult to acquire and match, while 2.5D models are broadly available. We can therefore apply it to places unseen during training.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"33 1","pages":"4590-4597"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73999563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Video Segmentation via Multiple Granularity Analysis 基于多粒度分析的视频分割
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-01 DOI: 10.1109/CVPR.2017.676
Rui Yang, Bingbing Ni, Chao Ma, Yi Xu, Xiaokang Yang
{"title":"Video Segmentation via Multiple Granularity Analysis","authors":"Rui Yang, Bingbing Ni, Chao Ma, Yi Xu, Xiaokang Yang","doi":"10.1109/CVPR.2017.676","DOIUrl":"https://doi.org/10.1109/CVPR.2017.676","url":null,"abstract":"We introduce a Multiple Granularity Analysis framework for video segmentation in a coarse-to-fine manner. We cast video segmentation as a spatio-temporal superpixel labeling problem. Benefited from the bounding volume provided by off-the-shelf object trackers, we estimate the foreground/ background super-pixel labeling using the spatiotemporal multiple instance learning algorithm to obtain coarse foreground/background separation within the volume. We further refine the segmentation mask in the pixel level using the graph-cut model. Extensive experiments on benchmark video datasets demonstrate the superior performance of the proposed video segmentation algorithm.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"23 1","pages":"6383-6392"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75815414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信