Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson
{"title":"nocaps: novel object captioning at scale","authors":"Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson","doi":"10.1109/ICCV.2019.00904","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00904","url":null,"abstract":"Image captioning models have achieved impressive results on datasets containing limited visual concepts and large amounts of paired image-caption training data. However, if these models are to ever function in the wild, a much larger variety of visual concepts must be learned, ideally from less supervision. To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task. Dubbed ‘nocaps’, for novel object captioning at scale, our benchmark consists of 166,100 human-generated captions describing 15,100 images from the Open Images validation and test sets. The associated training data consists of COCO image-caption pairs, plus Open Images image-level labels and object bounding boxes. Since Open Images contains many more classes than COCO, nearly 400 object classes seen in test images have no or very few associated training captions (hence, nocaps). We extend existing novel object captioning models to establish strong baselines for this benchmark and provide analysis to guide future work.","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":"63 1","pages":"8947-8956"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89398399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengdan Zhang, Junliang Xing, Jin Gao, Xinchu Shi, Qiang Wang, Weiming Hu
{"title":"Joint Scale-Spatial Correlation Tracking with Adaptive Rotation Estimation","authors":"Mengdan Zhang, Junliang Xing, Jin Gao, Xinchu Shi, Qiang Wang, Weiming Hu","doi":"10.1109/ICCVW.2015.81","DOIUrl":"https://doi.org/10.1109/ICCVW.2015.81","url":null,"abstract":"Boosted by large and standardized benchmark datasets, visual object tracking has made great progress in recent years and brought about many new trackers. Among these trackers, correlation filter based tracking schema exhibits impressive robustness and accuracy. In this work, we present a fully functional correlation filter based tracking algorithm which is able to simultaneously model target appearance changes from spatial displacements, scale variations, and rotation transformations. The proposed tracker first represents the exhaustive template searching in the joint scale and spatial space by a block-circulant matrix. Then, by transferring the target template from the Cartesian coordinate system to the Log-Polar coordinate system, the circulant structure is well preserved for the target even after whole orientation rotation. With these novel representation and transformation, object tracking is efficiently and effectively performed in the joint space with fast Fourier Transform. Experimental results on the VOT 2015 benchmark dataset demonstrate its superior performance over state-of-the-art tracking algorithms.","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":"29 1","pages":"595-603"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77623493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aggregating Local Deep Features for Image Retrieval","authors":"Artem Babenko, V. Lempitsky","doi":"10.1109/ICCV.2015.150","DOIUrl":"https://doi.org/10.1109/ICCV.2015.150","url":null,"abstract":"Several recent works have shown that image descriptors produced by deep convolutional neural networks provide state-of-the-art performance for image classification and retrieval problems. It also has been shown that the activations from the convolutional layers can be interpreted as local features describing particular image regions. These local features can be aggregated using aggregating methods developed for local features (e.g. Fisher vectors), thus providing new powerful global descriptor. In this paper we investigate possible ways to aggregate local deep features to produce compact descriptors for image retrieval. First, we show that deep features and traditional hand-engineered features have quite different distributions of pairwise similarities, hence existing aggregation methods have to be carefully re-evaluated. Such re-evaluation reveals that in contrast to shallow features, the simple aggregation method based on sum pooling provides the best performance for deep convolutional features. This method is efficient, has few parameters, and bears little risk of overfitting when e.g. learning the PCA matrix. In addition, we suggest a simple yet efficient query expansion scheme suitable for the proposed aggregation method. Overall, the new compact global descriptor improves the state-of-the-art on four common benchmarks considerably.","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":"3 1","pages":"1269-1277"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81353089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structural Kernel Learning for Large Scale Multiclass Object Co-detection","authors":"Zeeshan Hayder, Xuming He, M. Salzmann","doi":"10.1109/ICCV.2015.302","DOIUrl":"https://doi.org/10.1109/ICCV.2015.302","url":null,"abstract":"Exploiting contextual relationships across images has recently proven key to improve object detection. The resulting object co-detection algorithms, however, fail to exploit the correlations between multiple classes and, for scalability reasons are limited to modeling object instance similarity with relatively low-dimensional hand-crafted features. Here, we address the problem of multiclass object co-detection for large scale datasets. To this end, we formulate co-detection as the joint multiclass labeling of object candidates obtained in a class-independent manner. To exploit the correlations between objects, we build a fully-connected CRF on the candidates, which explicitly incorporates both geometric layout relations across object classes and similarity relations across multiple images. We then introduce a structural boosting algorithm that lets us exploits rich, high-dimensional deep network features to learn object similarity within our fully-connected CRF. Our experiments on PASCAL VOC 2007 and 2012 evidences the benefits of our approach over object detection with RCNN, single-image CRF methods and state-of-the-art co-detection algorithms.","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":"10 1","pages":"2632-2640"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80880384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. S. Pedersen, Kristoffer Stensbo-Smidt, A. Zirm, C. Igel
{"title":"Shape Index Descriptors Applied to Texture-Based Galaxy Analysis","authors":"K. S. Pedersen, Kristoffer Stensbo-Smidt, A. Zirm, C. Igel","doi":"10.1109/ICCV.2013.303","DOIUrl":"https://doi.org/10.1109/ICCV.2013.303","url":null,"abstract":"A texture descriptor based on the shape index and the accompanying curvedness measure is proposed, and it is evaluated for the automated analysis of astronomical image data. A representative sample of images of low-red shift galaxies from the Sloan Digital Sky Survey (SDSS) serves as a test bed. The goal of applying texture descriptors to these data is to extract novel information about galaxies, information which is often lost in more traditional analysis. In this study, we build a regression model for predicting a spectroscopic quantity, the specific star-formation rate (sSFR). As texture features we consider multi-scale gradient orientation histograms as well as multi-scale shape index histograms, which lead to a new descriptor. Our results show that we can successfully predict spectroscopic quantities from the texture in optical multi-band images. We successfully recover the observed bi-modal distribution of galaxies into quiescent and star-forming. The state-of-the-art for predicting the sSFR is a color-based physical model. We significantly improve its accuracy by augmenting the model with texture information. This study is the first step towards enabling the quantification of physical galaxy properties from imaging data alone.","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":"67 1","pages":"2440-2447"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75589941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy Association Filter for Online Data Association with Missing Data","authors":"E. Abir, Dubuisson Séverine, Béréziat Dominique","doi":"10.1007/978-3-540-89682-1_18","DOIUrl":"https://doi.org/10.1007/978-3-540-89682-1_18","url":null,"abstract":"","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":"25 1","pages":"244-257"},"PeriodicalIF":0.0,"publicationDate":"2007-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86990220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Sminchisescu, Atul Kanaujia, Zhiguo Li, Dimitris N. Metaxas
{"title":"Conditional Random Fields for Contextual Human Motion Recognition","authors":"C. Sminchisescu, Atul Kanaujia, Zhiguo Li, Dimitris N. Metaxas","doi":"10.1109/ICCV.2005.59","DOIUrl":"https://doi.org/10.1109/ICCV.2005.59","url":null,"abstract":"We present algorithms for recognizing human motion in monocular video sequences, based on discriminative conditional random field (CRF) and maximum entropy Markov models (MEMM). Existing approaches to this problem typically use generative (joint) structures like the hidden Markov model (HMM). Therefore they have to make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate overlapping features or long term contextual dependencies in the observation sequence. In contrast, conditional models like the CRFs seamlessly represent contextual dependencies, support efficient, exact inference using dynamic programming, and their parameters can be trained using convex optimization. We introduce conditional graphical models as complementary tools for human motion recognition and present an extensive set of experiments that show how these typically outperform HMMs in classifying not only diverse human activities like walking, jumping. running, picking or dancing, but also for discriminating among subtle motion styles like normal walk and wander walk","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":"38 1","pages":"1808-1815"},"PeriodicalIF":0.0,"publicationDate":"2005-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78294599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An affine invariant deformable shape representation for general curves","authors":"Astrom","doi":"10.1109/ICCV.2003.1238477","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238477","url":null,"abstract":"Automatic construction of shape models from examples has been the focus of intense research during the last couple of years. These methods have proved to be useful for shape segmentation, tracking and shape understanding. In this paper novel theory to automate shape modelling is described. The theory is intrinsically defined for curves although curves are infinite dimensional objects. The theory is independent of parameterisation and affine transformations. We suggest a method for implementing the ideas and compare it to minimising the description length of the model (MDL). It turns out that the accuracy of the two methods is comparable. Both the MDL and our approach can get stuck at local minima. Our algorithm is less computational expensive and relatively good solutions are obtained after a few iterations. The MDL is, however, better suited at fine-tuning the parameters given good initial estimates to the problem. It is shown that a combination of the two methods outperforms either on its own.","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":"50 1","pages":"1142-1149"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76499372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Very High Accuracy Velocity Estimation using Orientation Tensors Parametric Motion and Simultaneous Segmentation of the Motion Field","authors":"Gunnar Farnebäck","doi":"10.1109/ICCV.2001.10042","DOIUrl":"https://doi.org/10.1109/ICCV.2001.10042","url":null,"abstract":"In a previous paper, the author presented a new velocity estimation algorithm, using orientation tensors and parametric motion models to provide both fast and accurate results. One of the tradeoffs between accuracy and speed was that no attempts were made to obtain regions of coherent motion when estimating the parametric models. In this paper we show how this can be improved by doing a simultaneous segmentation of the motion field. The resulting algorithm is slower than the previous one, but more accurate. This is shown by evaluation on the well-known Yosemite sequence, where already the previous algorithm showed an accuracy which was substantially better than for earlier published methods. This result has now been improved further.","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":"50 1","pages":"171-177"},"PeriodicalIF":0.0,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86108139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Projection Matrices and their Applications in Computer Vision","authors":"Lior Wolf, A. Shashua","doi":"10.1109/ICCV.2001.10057","DOIUrl":"https://doi.org/10.1109/ICCV.2001.10057","url":null,"abstract":"","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":"11 1","pages":"412-419"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75317134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}