2013 IEEE International Conference on Computer Vision最新文献

筛选
英文 中文
Joint Learning of Discriminative Prototypes and Large Margin Nearest Neighbor Classifiers 判别原型与大边界最近邻分类器的联合学习
2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.386
Martin Köstinger, Paul Wohlhart, P. Roth, H. Bischof
{"title":"Joint Learning of Discriminative Prototypes and Large Margin Nearest Neighbor Classifiers","authors":"Martin Köstinger, Paul Wohlhart, P. Roth, H. Bischof","doi":"10.1109/ICCV.2013.386","DOIUrl":"https://doi.org/10.1109/ICCV.2013.386","url":null,"abstract":"In this paper, we raise important issues concerning the evaluation complexity of existing Mahalanobis metric learning methods. The complexity scales linearly with the size of the dataset. This is especially cumbersome on large scale or for real-time applications with limited time budget. To alleviate this problem we propose to represent the dataset by a fixed number of discriminative prototypes. In particular, we introduce a new method that jointly chooses the positioning of prototypes and also optimizes the Mahalanobis distance metric with respect to these. We show that choosing the positioning of the prototypes and learning the metric in parallel leads to a drastically reduced evaluation effort while maintaining the discriminative essence of the original dataset. Moreover, for most problems our method performing k-nearest prototype (k-NP) classification on the condensed dataset leads to even better generalization compared to k-NN classification using all data. Results on a variety of challenging benchmarks demonstrate the power of our method. These include standard machine learning datasets as well as the challenging Public Figures Face Database. On the competitive machine learning benchmarks we are comparable to the state-of-the-art while being more efficient. On the face benchmark we clearly outperform the state-of-the-art in Mahalanobis metric learning with drastically reduced evaluation effort.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"14 1","pages":"3112-3119"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76760411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Stable Hyper-pooling and Query Expansion for Event Detection 事件检测的稳定超池和查询扩展
2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.229
Matthijs Douze, Jérôme Revaud, C. Schmid, H. Jégou
{"title":"Stable Hyper-pooling and Query Expansion for Event Detection","authors":"Matthijs Douze, Jérôme Revaud, C. Schmid, H. Jégou","doi":"10.1109/ICCV.2013.229","DOIUrl":"https://doi.org/10.1109/ICCV.2013.229","url":null,"abstract":"This paper makes two complementary contributions to event retrieval in large collections of videos. First, we propose hyper-pooling strategies that encode the frame descriptors into a representation of the video sequence in a stable manner. Our best choices compare favorably with regular pooling techniques based on k-means quantization. Second, we introduce a technique to improve the ranking. It can be interpreted either as a query expansion method or as a similarity adaptation based on the local context of the query video descriptor. Experiments on public benchmarks show that our methods are complementary and improve event retrieval results, without sacrificing efficiency.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"24 2 1","pages":"1825-1832"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78129247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies 寻找最佳路径:一种高效准确的图像层次分类器
2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.40
Min Sun, Wanming Huang, S. Savarese
{"title":"Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies","authors":"Min Sun, Wanming Huang, S. Savarese","doi":"10.1109/ICCV.2013.40","DOIUrl":"https://doi.org/10.1109/ICCV.2013.40","url":null,"abstract":"Many methods have been proposed to solve the image classification problem for a large number of categories. Among them, methods based on tree-based representations achieve good trade-off between accuracy and test time efficiency. While focusing on learning a tree-shaped hierarchy and the corresponding set of classifiers, most of them [11, 2, 14] use a greedy prediction algorithm for test time efficiency. We argue that the dramatic decrease in accuracy at high efficiency is caused by the specific design choice of the learning and greedy prediction algorithms. In this work, we propose a classifier which achieves a better trade-off between efficiency and accuracy with a given tree-shaped hierarchy. First, we convert the classification problem as finding the best path in the hierarchy, and a novel branch-and-bound-like algorithm is introduced to efficiently search for the best path. Second, we jointly train the classifiers using a novel Structured SVM (SSVM) formulation with additional bound constraints. As a result, our method achieves a significant 4.65%, 5.43%, and 4.07% (relative 24.82%, 41.64%, and 109.79%) improvement in accuracy at high efficiency compared to state-of-the-art greedy \"tree-based\" methods [14] on Caltech-256 [15], SUN [32] and Image Net 1K [9] dataset, respectively. Finally, we show that our branch-and-bound-like algorithm naturally ranks the paths in the hierarchy (Fig. 8) so that users can further process them.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"28 1","pages":"265-272"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78193589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
What Do You Do? Occupation Recognition in a Photo via Social Context 你是怎么做的?基于社会语境的照片职业识别
2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.451
Ming Shao, Liangyue Li, Y. Fu
{"title":"What Do You Do? Occupation Recognition in a Photo via Social Context","authors":"Ming Shao, Liangyue Li, Y. Fu","doi":"10.1109/ICCV.2013.451","DOIUrl":"https://doi.org/10.1109/ICCV.2013.451","url":null,"abstract":"In this paper, we investigate the problem of recognizing occupations of multiple people with arbitrary poses in a photo. Previous work utilizing single person's nearly frontal clothing information and fore/background context preliminarily proves that occupation recognition is computationally feasible in computer vision. However, in practice, multiple people with arbitrary poses are common in a photo, and recognizing their occupations is even more challenging. We argue that with appropriately built visual attributes, co-occurrence, and spatial configuration model that is learned through structure SVM, we can recognize multiple people's occupations in a photo simultaneously. To evaluate our method's performance, we conduct extensive experiments on a new well-labeled occupation database with 14 representative occupations and over 7K images. Results on this database validate our method's effectiveness and show that occupation recognition is solvable in a more general case.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"159 1","pages":"3631-3638"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75121827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Saliency Detection in Large Point Sets 大型点集的显著性检测
2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.446
Elizabeth Shtrom, G. Leifman, A. Tal
{"title":"Saliency Detection in Large Point Sets","authors":"Elizabeth Shtrom, G. Leifman, A. Tal","doi":"10.1109/ICCV.2013.446","DOIUrl":"https://doi.org/10.1109/ICCV.2013.446","url":null,"abstract":"While saliency in images has been extensively studied in recent years, there is very little work on saliency of point sets. This is despite the fact that point sets and range data are becoming ever more widespread and have myriad applications. In this paper we present an algorithm for detecting the salient points in unorganized 3D point sets. Our algorithm is designed to cope with extremely large sets, which may contain tens of millions of points. Such data is typical of urban scenes, which have recently become commonly available on the web. No previous work has handled such data. For general data sets, we show that our results are competitive with those of saliency detection of surfaces, although we do not have any connectivity information. We demonstrate the utility of our algorithm in two applications: producing a set of the most informative viewpoints and suggesting an informative city tour given a city scan.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"49 1","pages":"3591-3598"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75257843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Similarity Metric Learning for Face Recognition 人脸识别的相似度度量学习
2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.299
Qiong Cao, Yiming Ying, Peng Li
{"title":"Similarity Metric Learning for Face Recognition","authors":"Qiong Cao, Yiming Ying, Peng Li","doi":"10.1109/ICCV.2013.299","DOIUrl":"https://doi.org/10.1109/ICCV.2013.299","url":null,"abstract":"Recently, there is a considerable amount of efforts devoted to the problem of unconstrained face verification, where the task is to predict whether pairs of images are from the same person or not. This problem is challenging and difficult due to the large variations in face images. In this paper, we develop a novel regularization framework to learn similarity metrics for unconstrained face verification. We formulate its objective function by incorporating the robustness to the large intra-personal variations and the discriminative power of novel similarity metrics. In addition, our formulation is a convex optimization problem which guarantees the existence of its global solution. Experiments show that our proposed method achieves the state-of-the-art results on the challenging Labeled Faces in the Wild (LFW) database [10].","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"23 1","pages":"2408-2415"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75270525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 207
Forward Motion Deblurring 向前运动去模糊
2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.185
Shicheng Zheng, Li Xu, Jiaya Jia
{"title":"Forward Motion Deblurring","authors":"Shicheng Zheng, Li Xu, Jiaya Jia","doi":"10.1109/ICCV.2013.185","DOIUrl":"https://doi.org/10.1109/ICCV.2013.185","url":null,"abstract":"We handle a special type of motion blur considering that cameras move primarily forward or backward. Solving this type of blur is of unique practical importance since nearly all car, traffic and bike-mounted cameras follow out-of-plane translational motion. We start with the study of geometric models and analyze the difficulty of existing methods to deal with them. We also propose a solution accounting for depth variation. Homographies associated with different 3D planes are considered and solved for in an optimization framework. Our method is verified on several natural image examples that cannot be satisfyingly dealt with by previous methods.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"41 1","pages":"1465-1472"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75716635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Structured Forests for Fast Edge Detection 结构化森林快速边缘检测
2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.231
Piotr Dollár, C. L. Zitnick
{"title":"Structured Forests for Fast Edge Detection","authors":"Piotr Dollár, C. L. Zitnick","doi":"10.1109/ICCV.2013.231","DOIUrl":"https://doi.org/10.1109/ICCV.2013.231","url":null,"abstract":"Edge detection is a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. In this paper we take advantage of the structure present in local image patches to learn both an accurate and computationally efficient edge detector. We formulate the problem of predicting local edge masks in a structured learning framework applied to random decision forests. Our novel approach to learning decision trees robustly maps the structured labels to a discrete space on which standard information gain measures may be evaluated. The result is an approach that obtains real time performance that is orders of magnitude faster than many competing state-of-the-art approaches, while also achieving state-of-the-art edge detection results on the BSDS500 Segmentation dataset and NYU Depth dataset. Finally, we show the potential of our approach as a general purpose edge detector by showing our learned edge models generalize well across datasets.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"49 1","pages":"1841-1848"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74677878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 934
From Where and How to What We See 从哪里和如何到我们所看到的
2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.83
S. Karthikeyan, V. Jagadeesh, Renuka Shenoy, M. Eckstein, B. S. Manjunath
{"title":"From Where and How to What We See","authors":"S. Karthikeyan, V. Jagadeesh, Renuka Shenoy, M. Eckstein, B. S. Manjunath","doi":"10.1109/ICCV.2013.83","DOIUrl":"https://doi.org/10.1109/ICCV.2013.83","url":null,"abstract":"Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using a fully connected Markov Random Field (MRF). Given the eye tracking data from a test image, it predicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"126 1","pages":"625-632"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74679659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Dynamic Pooling for Complex Event Recognition 复杂事件识别的动态池
2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.339
Wei-Xin Li, Qian Yu, Ajay Divakaran, N. Vasconcelos
{"title":"Dynamic Pooling for Complex Event Recognition","authors":"Wei-Xin Li, Qian Yu, Ajay Divakaran, N. Vasconcelos","doi":"10.1109/ICCV.2013.339","DOIUrl":"https://doi.org/10.1109/ICCV.2013.339","url":null,"abstract":"The problem of adaptively selecting pooling regions for the classification of complex video events is considered. Complex events are defined as events composed of several characteristic behaviors, whose temporal configuration can change from sequence to sequence. A dynamic pooling operator is defined so as to enable a unified solution to the problems of event specific video segmentation, temporal structure modeling, and event detection. Video is decomposed into segments, and the segments most informative for detecting a given event are identified, so as to dynamically determine the pooling operator most suited for each sequence. This dynamic pooling is implemented by treating the locations of characteristic segments as hidden information, which is inferred, on a sequence-by-sequence basis, via a large-margin classification rule with latent variables. Although the feasible set of segment selections is combinatorial, it is shown that a globally optimal solution to the inference problem can be obtained efficiently, through the solution of a series of linear programs. Besides the coarse-level location of segments, a finer model of video structure is implemented by jointly pooling features of segment-tuples. Experimental evaluation demonstrates that the resulting event detector has state-of-the-art performance on challenging video datasets.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"11 1","pages":"2728-2735"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73043738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信