{"title":"Simultaneous Object Recognition and Localization in Image Collections","authors":"Shao-Chuan Wang, Y. Wang","doi":"10.1109/AVSS.2010.47","DOIUrl":null,"url":null,"abstract":"This papers presents a weakly supervised method to simultaneouslyaddress object localization and recognitionproblems. Unlike prior work using exhaustive search methodssuch as sliding windows, we propose to learn categoryand image-specific visual words in image collections by extractingdiscriminating feature information via two differenttypes of support vector machines: the standard L2-regularized L1-loss SVM, and the one with L1 regularizationand L2 loss. The selected visual words are used toconstruct visual attention maps, which provide descriptiveinformation for each object category. To preserve local spatialinformation, we further refine these maps by Gaussiansmoothing and cross bilateral filtering, and thus both appearanceand spatial information can be utilized for visualcategorization applications. Our method is not limited toany specific type of image descriptors, or any particularcodebook learning and feature encoding techniques. In thispaper, we conduct preliminary experiments on a subset ofthe Caltech-256 dataset using bag-of-feature (BOF) modelswith SIFT descriptors. We show that the use of our visual attentionmaps improves the recognition performance, whilethe one selected by L1-regularized L2-loss SVMs exhibitsthe best recognition and localization results.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS.2010.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This papers presents a weakly supervised method to simultaneouslyaddress object localization and recognitionproblems. Unlike prior work using exhaustive search methodssuch as sliding windows, we propose to learn categoryand image-specific visual words in image collections by extractingdiscriminating feature information via two differenttypes of support vector machines: the standard L2-regularized L1-loss SVM, and the one with L1 regularizationand L2 loss. The selected visual words are used toconstruct visual attention maps, which provide descriptiveinformation for each object category. To preserve local spatialinformation, we further refine these maps by Gaussiansmoothing and cross bilateral filtering, and thus both appearanceand spatial information can be utilized for visualcategorization applications. Our method is not limited toany specific type of image descriptors, or any particularcodebook learning and feature encoding techniques. In thispaper, we conduct preliminary experiments on a subset ofthe Caltech-256 dataset using bag-of-feature (BOF) modelswith SIFT descriptors. We show that the use of our visual attentionmaps improves the recognition performance, whilethe one selected by L1-regularized L2-loss SVMs exhibitsthe best recognition and localization results.