{"title":"Selection of relevant information to improve Image Classification using Bag of Visual Words","authors":"Eduardo Fidalgo Fernández","doi":"10.5565/REV/ELCVIA.1102","DOIUrl":null,"url":null,"abstract":"One of the main challenges in computer vision is image classification. Nowadays the number of images increases exponentially every day; therefore, it is important to classify them in a reliable way. The conventional image classification pipeline usually consists on extracting local image features, encoding them as a feature vector and classify them using a previously created model. With regards to feature codification, the Bag of Words model and its extensions, such as pyramid matching and weighted schemes, have achieved quite good results and have become the state of the art methods. The process as mentioned above is not perfect and computers, as well as humans, may make mistakes in any of the steps, causing a performance drop in classification. Some of the primary sources of error on large-scale image classification are the presence of multiple objects in the image, small or very thin objects, incorrect annotations or fine-grained recognition tasks among others. Based on those problems and the steps of a typical image classification pipeline, the motivation of this PhD thesis was to provide some guidelines to improve the quality of the extracted features to obtain better classification results. The contributions of the PhD thesis demonstrated how a good feature selection can contribute to improving the fine-grained classification, and that there would even be no need to have a big training data set to learn the key features of each class and to predict with good results.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"12 1","pages":"5-8"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Letters on Computer Vision and Image Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5565/REV/ELCVIA.1102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 1
Abstract
One of the main challenges in computer vision is image classification. Nowadays the number of images increases exponentially every day; therefore, it is important to classify them in a reliable way. The conventional image classification pipeline usually consists on extracting local image features, encoding them as a feature vector and classify them using a previously created model. With regards to feature codification, the Bag of Words model and its extensions, such as pyramid matching and weighted schemes, have achieved quite good results and have become the state of the art methods. The process as mentioned above is not perfect and computers, as well as humans, may make mistakes in any of the steps, causing a performance drop in classification. Some of the primary sources of error on large-scale image classification are the presence of multiple objects in the image, small or very thin objects, incorrect annotations or fine-grained recognition tasks among others. Based on those problems and the steps of a typical image classification pipeline, the motivation of this PhD thesis was to provide some guidelines to improve the quality of the extracted features to obtain better classification results. The contributions of the PhD thesis demonstrated how a good feature selection can contribute to improving the fine-grained classification, and that there would even be no need to have a big training data set to learn the key features of each class and to predict with good results.
计算机视觉的主要挑战之一是图像分类。如今,图像的数量每天都呈指数级增长;因此,以可靠的方式对它们进行分类是很重要的。传统的图像分类管道通常包括提取局部图像特征,将其编码为特征向量,并使用先前创建的模型对其进行分类。在特征编码方面,Bag of Words模型及其扩展,如金字塔匹配和加权方案,已经取得了相当好的效果,成为最先进的方法。上面提到的过程并不完美,计算机和人类一样,可能在任何一个步骤中犯错误,导致分类性能下降。大规模图像分类的一些主要错误来源是图像中存在多个对象,小或非常薄的对象,不正确的注释或细粒度识别任务等。基于这些问题和典型图像分类流水线的步骤,本博士论文的动机是为提高提取特征的质量以获得更好的分类结果提供一些指导。博士论文的贡献证明了良好的特征选择如何有助于改进细粒度分类,甚至不需要有一个大的训练数据集来学习每个类的关键特征并获得良好的预测结果。