Shallow vs. Deep Image Representations: A comparative Study Applied for the Problem of Generic Object Recognition

2019 First International Conference of Intelligent Computing and Engineering (ICOICE) Pub Date : 2019-12-01 DOI:10.1109/ICOICE48418.2019.9035136

Yasser M. Abdullah, Mussa M. Ahmed

{"title":"Shallow vs. Deep Image Representations: A comparative Study Applied for the Problem of Generic Object Recognition","authors":"Yasser M. Abdullah, Mussa M. Ahmed","doi":"10.1109/ICOICE48418.2019.9035136","DOIUrl":null,"url":null,"abstract":"The traditional approach for solving the object recognition problem requires image representations to be first extracted and then fed to a learning model such as an SVM to learn the classification decision boundary. These representations are handcrafted and heavily engineered by running the object image through a sequence of pipeline processes that require a good prior knowledge of the problem domain. However, in end-to-end deep learning models, image representations along with classification decision boundary are all learnt directly from the raw image pixels requiring no prior knowledge of the problem domain. Moreover, the deep model features are more discriminative than handcrafted ones since the model is trained to discriminate between features belonging to different classes. The purpose of this study is six fold: (1) review the literature of the pipeline processes used in the previous state-of-the-art codebook model approach for tackling the problem of generic object recognition, (2) Introduce several enhancements in the local feature extraction and normalization processes of the recognition pipeline, (3) compare the enhancements proposed to different encoding methods and contrast them to previous results, (4) experiment with current state-of-the-art deep model architectures used for object recognition, (5) compare between deep representations extracted from the deep learning model and shallow representations handcrafted by an expert and produced through the recognition pipeline, and finally, (6) improve the results further by combining multiple different deep learning models into an ensemble and taking the maximum posterior probability.","PeriodicalId":109414,"journal":{"name":"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICE48418.2019.9035136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The traditional approach for solving the object recognition problem requires image representations to be first extracted and then fed to a learning model such as an SVM to learn the classification decision boundary. These representations are handcrafted and heavily engineered by running the object image through a sequence of pipeline processes that require a good prior knowledge of the problem domain. However, in end-to-end deep learning models, image representations along with classification decision boundary are all learnt directly from the raw image pixels requiring no prior knowledge of the problem domain. Moreover, the deep model features are more discriminative than handcrafted ones since the model is trained to discriminate between features belonging to different classes. The purpose of this study is six fold: (1) review the literature of the pipeline processes used in the previous state-of-the-art codebook model approach for tackling the problem of generic object recognition, (2) Introduce several enhancements in the local feature extraction and normalization processes of the recognition pipeline, (3) compare the enhancements proposed to different encoding methods and contrast them to previous results, (4) experiment with current state-of-the-art deep model architectures used for object recognition, (5) compare between deep representations extracted from the deep learning model and shallow representations handcrafted by an expert and produced through the recognition pipeline, and finally, (6) improve the results further by combining multiple different deep learning models into an ensemble and taking the maximum posterior probability.

查看原文本刊更多论文

浅与深图像表示:用于通用对象识别问题的比较研究

解决目标识别问题的传统方法需要首先提取图像表示，然后将其输入到支持向量机等学习模型中以学习分类决策边界。这些表示是手工制作和精心设计的，通过一系列管道过程运行对象图像，这些过程需要对问题域有很好的先验知识。然而，在端到端深度学习模型中，图像表示和分类决策边界都是直接从原始图像像素中学习的，不需要预先知道问题域。此外，深度模型特征比手工模型特征更具判别性，因为模型被训练来区分属于不同类别的特征。这项研究的目的有六个方面:(1)回顾了先前用于解决通用目标识别问题的最先进的代码本模型方法中使用的管道过程的文献;(2)介绍了识别管道的局部特征提取和归一化过程中的几种增强方法;(3)比较了针对不同编码方法提出的增强方法，并将其与先前的结果进行了对比;(4)对当前用于目标识别的最先进的深度模型体系结构进行了实验。(5)比较从深度学习模型中提取的深度表征和由专家手工制作并通过识别管道产生的浅表征，最后(6)通过将多个不同的深度学习模型组合成一个集合并取最大后验概率进一步改进结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 First International Conference of Intelligent Computing and Engineering (ICOICE)

自引率

0.00%

发文量