Hesen Chen, Jingyu Wang, Q. Qi, Yujian Li, Haifeng Sun
{"title":"Bilinear CNN Models for Food Recognition","authors":"Hesen Chen, Jingyu Wang, Q. Qi, Yujian Li, Haifeng Sun","doi":"10.1109/DICTA.2017.8227411","DOIUrl":null,"url":null,"abstract":"Due to the diversity of food types and the slight differences between different dishes, the genre of food images becomes a new challenge in the field of computer vision. To tackle this problem, recent efforts are focusing on designing hand-crafted features or extracting features automatically by using deep convolutional neural network. Although these methods have reported a series of success, their general architectures fail to capture the fine-grained features of similar dishes adequately. Inspired by the bilinear CNN models in the field of fine-grained classification, we have exploited such a similar structure in which two deep convolution networks are used as feature extractors and the outputs of them fused to obtain fine-grained features. These features are used to train the food classifier. We have conducted experiments on three publicly benchmark food datasets to evaluate the proposed architecture. The experiments exhibit that our method is comparable to the existing approaches.","PeriodicalId":194175,"journal":{"name":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA.2017.8227411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Due to the diversity of food types and the slight differences between different dishes, the genre of food images becomes a new challenge in the field of computer vision. To tackle this problem, recent efforts are focusing on designing hand-crafted features or extracting features automatically by using deep convolutional neural network. Although these methods have reported a series of success, their general architectures fail to capture the fine-grained features of similar dishes adequately. Inspired by the bilinear CNN models in the field of fine-grained classification, we have exploited such a similar structure in which two deep convolution networks are used as feature extractors and the outputs of them fused to obtain fine-grained features. These features are used to train the food classifier. We have conducted experiments on three publicly benchmark food datasets to evaluate the proposed architecture. The experiments exhibit that our method is comparable to the existing approaches.