{"title":"Clique-graph matching by preserving global & local structure","authors":"Weizhi Nie, Anan Liu, Zan Gao, Yuting Su","doi":"10.1109/CVPR.2015.7299080","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7299080","url":null,"abstract":"This paper originally proposes the clique-graph and further presents a clique-graph matching method by preserving global and local structures. Especially, we formulate the objective function of clique-graph matching with respective to two latent variables, the clique information in the original graph and the pairwise clique correspondence constrained by the one-to-one matching. Since the objective function is not jointly convex to both latent variables, we decompose it into two consecutive steps for optimization: 1) clique-to-clique similarity measure by preserving local unary and pairwise correspondences; 2) graph-to-graph similarity measure by preserving global clique-to-clique correspondence. Extensive experiments on the synthetic data and real images show that the proposed method can outperform representative methods especially when both noise and outliers exist.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124819581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning to look up: Realtime monocular gaze correction using machine learning","authors":"D. Kononenko, V. Lempitsky","doi":"10.1109/CVPR.2015.7299098","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7299098","url":null,"abstract":"We revisit the well-known problem of gaze correction and present a solution based on supervised machine learning. At training time, our system observes pairs of images, where each pair contains the face of the same person with a fixed angular difference in gaze direction. It then learns to synthesize the second image of a pair from the first one. After learning, the system gets the ability to redirect the gaze of a previously unseen person by the same angular difference as in the training set. Unlike many previous solutions to gaze problem in videoconferencing, ours is purely monocular, i.e. it does not require any hardware apart from an in-built web-camera of a laptop. Being based on efficient machine learning predictors such as decision forests, the system is fast (runs in real-time on a single core of a modern laptop). In the paper, we demonstrate results on a variety of videoconferencing frames and evaluate the method quantitatively on the hold-out set of registered images. The supplementary video shows example sessions of our system at work.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124819845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast action proposals for human action detection and search","authors":"Gang Yu, Junsong Yuan","doi":"10.1109/CVPR.2015.7298735","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298735","url":null,"abstract":"In this paper we target at generating generic action proposals in unconstrained videos. Each action proposal corresponds to a temporal series of spatial bounding boxes, i.e., a spatio-temporal video tube, which has a good potential to locate one human action. Assuming each action is performed by a human with meaningful motion, both appearance and motion cues are utilized to measure the actionness of the video tubes. After picking those spatiotemporal paths of high actionness scores, our action proposal generation is formulated as a maximum set coverage problem, where greedy search is performed to select a set of action proposals that can maximize the overall actionness score. Compared with existing action proposal approaches, our action proposals do not rely on video segmentation and can be generated in nearly real-time. Experimental results on two challenging datasets, MSRII and UCF 101, validate the superior performance of our action proposals as well as competitive results on action detection and search.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121687842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transferring a semantic representation for person re-identification and search","authors":"Zhiyuan Shi, Timothy M. Hospedales, T. Xiang","doi":"10.1109/CVPR.2015.7299046","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7299046","url":null,"abstract":"Learning semantic attributes for person re-identification and description-based person search has gained increasing interest due to attributes' great potential as a pose and view-invariant representation. However, existing attribute-centric approaches have thus far underperformed state-of-the-art conventional approaches. This is due to their nonscalable need for extensive domain (camera) specific annotation. In this paper we present a new semantic attribute learning approach for person re-identification and search. Our model is trained on existing fashion photography datasets - either weakly or strongly labelled. It can then be transferred and adapted to provide a powerful semantic description of surveillance person detections, without requiring any surveillance domain supervision. The resulting representation is useful for both unsupervised and supervised person re-identification, achieving state-of-the-art and near state-of-the-art performance respectively. Furthermore, as a semantic representation it allows description-based person search to be integrated within the same framework.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121723974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengyang Wu, Fuxin Li, R. Sukthankar, James M. Rehg
{"title":"Robust video segment proposals with painless occlusion handling","authors":"Zhengyang Wu, Fuxin Li, R. Sukthankar, James M. Rehg","doi":"10.1109/CVPR.2015.7299047","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7299047","url":null,"abstract":"We propose a robust algorithm to generate video segment proposals. The proposals generated by our method can start from any frame in the video and are robust to complete occlusions. Our method does not assume specific motion models and even has a limited capability to generalize across videos. We build on our previous least squares tracking framework, where image segment proposals are generated and tracked using learned appearance models. The innovation in our new method lies in the use of two efficient moves, the merge move and free addition, to efficiently start segments from any frame and track them through complete occlusions, without much additional computation. Segment size interpolation is used for effectively detecting occlusions. We propose a new metric for evaluating video segment proposals on the challenging VSB-100 benchmark and present state-of-the-art results. Preliminary results are also shown for the potential use of our framework to track segments across different videos.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122044893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ting Zhang, Guo-Jun Qi, Jinhui Tang, Jingdong Wang
{"title":"Sparse composite quantization","authors":"Ting Zhang, Guo-Jun Qi, Jinhui Tang, Jingdong Wang","doi":"10.1109/CVPR.2015.7299085","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7299085","url":null,"abstract":"The quantization techniques have shown competitive performance in approximate nearest neighbor search. The state-of-the-art algorithm, composite quantization, takes advantage of the compositionabity, i.e., the vector approximation accuracy, as opposed to product quantization and Cartesian k-means. However, we have observed that the runtime cost of computing the distance table in composite quantization, which is used as a lookup table for fast distance computation, becomes nonnegligible in real applications, e.g., reordering the candidates retrieved from the inverted index when handling very large scale databases. To address this problem, we develop a novel approach, called sparse composite quantization, which constructs sparse dictionaries. The benefit is that the distance evaluation between the query and the dictionary element (a sparse vector) is accelerated using the efficient sparse vector operation, and thus the cost of distance table computation is reduced a lot. Experiment results on large scale ANN retrieval tasks (1M SIFTs and 1B SIFTs) and applications to object retrieval show that the proposed approach yields competitive performance: superior search accuracy to product quantization and Cartesian k-means with almost the same computing cost, and much faster ANN search than composite quantization with the same level of accuracy.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122533951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FlowWeb: Joint image set alignment by weaving consistent, pixel-wise correspondences","authors":"Tinghui Zhou, Yong Jae Lee, Stella X. Yu, Alexei A. Efros","doi":"10.1109/CVPR.2015.7298723","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298723","url":null,"abstract":"Given a set of poorly aligned images of the same visual concept without any annotations, we propose an algorithm to jointly bring them into pixel-wise correspondence by estimating a FlowWeb representation of the image set. FlowWeb is a fully-connected correspondence flow graph with each node representing an image, and each edge representing the correspondence flow field between a pair of images, i.e. a vector field indicating how each pixel in one image can find a corresponding pixel in the other image. Correspondence flow is related to optical flow but allows for correspondences between visually dissimilar regions if there is evidence they correspond transitively on the graph. Our algorithm starts by initializing all edges of this complete graph with an off-the-shelf, pairwise flow method. We then iteratively update the graph to force it to be more self-consistent. Once the algorithm converges, dense, globally-consistent correspondences can be read off the graph. Our results suggest that FlowWeb improves alignment accuracy over previous pairwise as well as joint alignment methods.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125116660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval","authors":"J. Xie, Yi Fang, Fan Zhu, E. Wong","doi":"10.1109/CVPR.2015.7298732","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298732","url":null,"abstract":"Complex geometric structural variations of 3D model usually pose great challenges in 3D shape matching and retrieval. In this paper, we propose a high-level shape feature learning scheme to extract features that are insensitive to deformations via a novel discriminative deep auto-encoder. First, a multiscale shape distribution is developed for use as input to the auto-encoder. Then, by imposing the Fisher discrimination criterion on the neurons in the hidden layer, we developed a novel discriminative deep auto-encoder for shape feature learning. Finally, the neurons in the hidden layers from multiple discriminative auto-encoders are concatenated to form a shape descriptor for 3D shape matching and retrieval. The proposed method is evaluated on the representative datasets that contain 3D models with large geometric variations, i.e., Mcgill and SHREC'10 ShapeGoogle datasets. Experimental results on the benchmark datasets demonstrate the effectiveness of the proposed method for 3D shape matching and retrieval.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125558248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunchao Gong, Marcin Pawłowski, Fei Yang, Louis Brandy, Lubomir D. Bourdev, R. Fergus
{"title":"Web scale photo hash clustering on a single machine","authors":"Yunchao Gong, Marcin Pawłowski, Fei Yang, Louis Brandy, Lubomir D. Bourdev, R. Fergus","doi":"10.1109/CVPR.2015.7298596","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298596","url":null,"abstract":"This paper addresses the problem of clustering a very large number of photos (i.e. hundreds of millions a day) in a stream into millions of clusters. This is particularly important as the popularity of photo sharing websites, such as Facebook, Google, and Instagram. Given large number of photos available online, how to efficiently organize them is an open problem. To address this problem, we propose to cluster the binary hash codes of a large number of photos into binary cluster centers. We present a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which we can build hash indexes to speedup computation. The proposed method is capable of clustering millions of photos on a single machine in a few minutes. We show that this approach is usually several magnitude faster than standard k-means and produces comparable clustering accuracy. In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125871587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active learning for structured probabilistic models with histogram approximation","authors":"Q. Sun, A. Laddha, Dhruv Batra","doi":"10.1109/CVPR.2015.7298984","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298984","url":null,"abstract":"This paper studies active learning in structured probabilistic models such as Conditional Random Fields (CRFs). This is a challenging problem because unlike unstructured prediction problems such as binary or multi-class classification, structured prediction problems involve a distribution with an exponentially-large support, for instance, over the space of all possible segmentations of an image. Thus, the entropy of such models is typically intractable to compute. We propose a crude yet surprisingly effective histogram approximation to the Gibbs distribution, which replaces the exponentially-large support with a coarsened distribution that may be viewed as a histogram over M bins. We show that our approach outperforms a number of baselines and results in a 90%-reduction in the number of annotations needed to achieve nearly the same accuracy as learning from the entire dataset.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129805604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}