{"title":"Shallow vs. Deep Image Representations: A comparative Study Applied for the Problem of Generic Object Recognition","authors":"Yasser M. Abdullah, Mussa M. Ahmed","doi":"10.1109/ICOICE48418.2019.9035136","DOIUrl":null,"url":null,"abstract":"The traditional approach for solving the object recognition problem requires image representations to be first extracted and then fed to a learning model such as an SVM to learn the classification decision boundary. These representations are handcrafted and heavily engineered by running the object image through a sequence of pipeline processes that require a good prior knowledge of the problem domain. However, in end-to-end deep learning models, image representations along with classification decision boundary are all learnt directly from the raw image pixels requiring no prior knowledge of the problem domain. Moreover, the deep model features are more discriminative than handcrafted ones since the model is trained to discriminate between features belonging to different classes. The purpose of this study is six fold: (1) review the literature of the pipeline processes used in the previous state-of-the-art codebook model approach for tackling the problem of generic object recognition, (2) Introduce several enhancements in the local feature extraction and normalization processes of the recognition pipeline, (3) compare the enhancements proposed to different encoding methods and contrast them to previous results, (4) experiment with current state-of-the-art deep model architectures used for object recognition, (5) compare between deep representations extracted from the deep learning model and shallow representations handcrafted by an expert and produced through the recognition pipeline, and finally, (6) improve the results further by combining multiple different deep learning models into an ensemble and taking the maximum posterior probability.","PeriodicalId":109414,"journal":{"name":"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 First International Conference of Intelligent Computing and Engineering (ICOICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICE48418.2019.9035136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The traditional approach for solving the object recognition problem requires image representations to be first extracted and then fed to a learning model such as an SVM to learn the classification decision boundary. These representations are handcrafted and heavily engineered by running the object image through a sequence of pipeline processes that require a good prior knowledge of the problem domain. However, in end-to-end deep learning models, image representations along with classification decision boundary are all learnt directly from the raw image pixels requiring no prior knowledge of the problem domain. Moreover, the deep model features are more discriminative than handcrafted ones since the model is trained to discriminate between features belonging to different classes. The purpose of this study is six fold: (1) review the literature of the pipeline processes used in the previous state-of-the-art codebook model approach for tackling the problem of generic object recognition, (2) Introduce several enhancements in the local feature extraction and normalization processes of the recognition pipeline, (3) compare the enhancements proposed to different encoding methods and contrast them to previous results, (4) experiment with current state-of-the-art deep model architectures used for object recognition, (5) compare between deep representations extracted from the deep learning model and shallow representations handcrafted by an expert and produced through the recognition pipeline, and finally, (6) improve the results further by combining multiple different deep learning models into an ensemble and taking the maximum posterior probability.