Fine-grained food image recognition using a convolutional neural network and swin transformer hybrid model

IF 4.6 2区农林科学 Q2 CHEMISTRY, APPLIED

Journal of Food Composition and Analysis Pub Date : 2025-10-01 DOI:10.1016/j.jfca.2025.108395

Zhiyong Xiao, Guang Diao, Chaoliang Liu, Zhaohong Deng

{"title":"Fine-grained food image recognition using a convolutional neural network and swin transformer hybrid model","authors":"Zhiyong Xiao, Guang Diao, Chaoliang Liu, Zhaohong Deng","doi":"10.1016/j.jfca.2025.108395","DOIUrl":null,"url":null,"abstract":"<div><div>With increasing public emphasis on dietary monitoring and quality of life, fine-grained food image recognition has become an important research area in computer vision. However, distinguishing visually similar food items remains challenging, as traditional classification methods often fail to achieve satisfactory accuracy. To address this, this paper proposes a novel CNN-Transformer-based model that integrates convolutional neural networks (CNNs) with attention mechanisms. Specifically, the model introduces a Global Attention and Local Covariance Convolutional Feature Fusion module into the Swin Transformer framework. This module combines a deep convolutional network, a multi-layer perceptron, and a feature fusion component, enabling better capture of fine-grained details while integrating global context. Extensive experiments conducted on two public fine-grained food image datasets, FoodX-251 and UEC Food-256, demonstrate the superior performance of the proposed model. It achieves accuracy rates of 81.47 % and 83.44 %, respectively, outperforming most existing methods under the same experimental conditions.</div></div>","PeriodicalId":15867,"journal":{"name":"Journal of Food Composition and Analysis","volume":"148 ","pages":"Article 108395"},"PeriodicalIF":4.6000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Composition and Analysis","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0889157525012116","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

With increasing public emphasis on dietary monitoring and quality of life, fine-grained food image recognition has become an important research area in computer vision. However, distinguishing visually similar food items remains challenging, as traditional classification methods often fail to achieve satisfactory accuracy. To address this, this paper proposes a novel CNN-Transformer-based model that integrates convolutional neural networks (CNNs) with attention mechanisms. Specifically, the model introduces a Global Attention and Local Covariance Convolutional Feature Fusion module into the Swin Transformer framework. This module combines a deep convolutional network, a multi-layer perceptron, and a feature fusion component, enabling better capture of fine-grained details while integrating global context. Extensive experiments conducted on two public fine-grained food image datasets, FoodX-251 and UEC Food-256, demonstrate the superior performance of the proposed model. It achieves accuracy rates of 81.47 % and 83.44 %, respectively, outperforming most existing methods under the same experimental conditions.

查看原文本刊更多论文

基于卷积神经网络和swin变压器混合模型的细粒度食物图像识别

随着人们对饮食监测和生活质量的日益重视，细粒度食物图像识别已成为计算机视觉的一个重要研究领域。然而，区分视觉上相似的食物仍然具有挑战性，因为传统的分类方法往往无法达到令人满意的准确性。为了解决这个问题，本文提出了一种新的基于cnn - transformer的模型，该模型将卷积神经网络（cnn）与注意力机制相结合。具体来说，该模型在Swin Transformer框架中引入了全局关注和局部协方差卷积特征融合模块。该模块结合了深度卷积网络、多层感知器和特征融合组件，能够在集成全局上下文的同时更好地捕获细粒度细节。在两个公开的细粒度食品图像数据集FoodX-251和UEC food -256上进行的大量实验证明了该模型的优越性能。在相同的实验条件下，准确率分别为81.47 %和83.44 %，优于大多数现有的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Food Composition and Analysis 工程技术-食品科技

CiteScore

6.20

自引率

11.60%

发文量

601

审稿时长

53 days

期刊介绍： The Journal of Food Composition and Analysis publishes manuscripts on scientific aspects of data on the chemical composition of human foods, with particular emphasis on actual data on composition of foods; analytical methods; studies on the manipulation, storage, distribution and use of food composition data; and studies on the statistics, use and distribution of such data and data systems. The Journal''s basis is nutrient composition, with increasing emphasis on bioactive non-nutrient and anti-nutrient components. Papers must provide sufficient description of the food samples, analytical methods, quality control procedures and statistical treatments of the data to permit the end users of the food composition data to evaluate the appropriateness of such data in their projects. The Journal does not publish papers on: microbiological compounds; sensory quality; aromatics/volatiles in food and wine; essential oils; organoleptic characteristics of food; physical properties; or clinical papers and pharmacology-related papers.