Zhiyong Xiao, Guang Diao, Chaoliang Liu, Zhaohong Deng
{"title":"Fine-grained food image recognition using a convolutional neural network and swin transformer hybrid model","authors":"Zhiyong Xiao, Guang Diao, Chaoliang Liu, Zhaohong Deng","doi":"10.1016/j.jfca.2025.108395","DOIUrl":null,"url":null,"abstract":"<div><div>With increasing public emphasis on dietary monitoring and quality of life, fine-grained food image recognition has become an important research area in computer vision. However, distinguishing visually similar food items remains challenging, as traditional classification methods often fail to achieve satisfactory accuracy. To address this, this paper proposes a novel CNN-Transformer-based model that integrates convolutional neural networks (CNNs) with attention mechanisms. Specifically, the model introduces a Global Attention and Local Covariance Convolutional Feature Fusion module into the Swin Transformer framework. This module combines a deep convolutional network, a multi-layer perceptron, and a feature fusion component, enabling better capture of fine-grained details while integrating global context. Extensive experiments conducted on two public fine-grained food image datasets, FoodX-251 and UEC Food-256, demonstrate the superior performance of the proposed model. It achieves accuracy rates of 81.47 % and 83.44 %, respectively, outperforming most existing methods under the same experimental conditions.</div></div>","PeriodicalId":15867,"journal":{"name":"Journal of Food Composition and Analysis","volume":"148 ","pages":"Article 108395"},"PeriodicalIF":4.6000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Composition and Analysis","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0889157525012116","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
With increasing public emphasis on dietary monitoring and quality of life, fine-grained food image recognition has become an important research area in computer vision. However, distinguishing visually similar food items remains challenging, as traditional classification methods often fail to achieve satisfactory accuracy. To address this, this paper proposes a novel CNN-Transformer-based model that integrates convolutional neural networks (CNNs) with attention mechanisms. Specifically, the model introduces a Global Attention and Local Covariance Convolutional Feature Fusion module into the Swin Transformer framework. This module combines a deep convolutional network, a multi-layer perceptron, and a feature fusion component, enabling better capture of fine-grained details while integrating global context. Extensive experiments conducted on two public fine-grained food image datasets, FoodX-251 and UEC Food-256, demonstrate the superior performance of the proposed model. It achieves accuracy rates of 81.47 % and 83.44 %, respectively, outperforming most existing methods under the same experimental conditions.
期刊介绍:
The Journal of Food Composition and Analysis publishes manuscripts on scientific aspects of data on the chemical composition of human foods, with particular emphasis on actual data on composition of foods; analytical methods; studies on the manipulation, storage, distribution and use of food composition data; and studies on the statistics, use and distribution of such data and data systems. The Journal''s basis is nutrient composition, with increasing emphasis on bioactive non-nutrient and anti-nutrient components. Papers must provide sufficient description of the food samples, analytical methods, quality control procedures and statistical treatments of the data to permit the end users of the food composition data to evaluate the appropriateness of such data in their projects.
The Journal does not publish papers on: microbiological compounds; sensory quality; aromatics/volatiles in food and wine; essential oils; organoleptic characteristics of food; physical properties; or clinical papers and pharmacology-related papers.