Fine-grained food image recognition using a convolutional neural network and swin transformer hybrid model

IF 4.6 2区 农林科学 Q2 CHEMISTRY, APPLIED
Zhiyong Xiao, Guang Diao, Chaoliang Liu, Zhaohong Deng
{"title":"Fine-grained food image recognition using a convolutional neural network and swin transformer hybrid model","authors":"Zhiyong Xiao,&nbsp;Guang Diao,&nbsp;Chaoliang Liu,&nbsp;Zhaohong Deng","doi":"10.1016/j.jfca.2025.108395","DOIUrl":null,"url":null,"abstract":"<div><div>With increasing public emphasis on dietary monitoring and quality of life, fine-grained food image recognition has become an important research area in computer vision. However, distinguishing visually similar food items remains challenging, as traditional classification methods often fail to achieve satisfactory accuracy. To address this, this paper proposes a novel CNN-Transformer-based model that integrates convolutional neural networks (CNNs) with attention mechanisms. Specifically, the model introduces a Global Attention and Local Covariance Convolutional Feature Fusion module into the Swin Transformer framework. This module combines a deep convolutional network, a multi-layer perceptron, and a feature fusion component, enabling better capture of fine-grained details while integrating global context. Extensive experiments conducted on two public fine-grained food image datasets, FoodX-251 and UEC Food-256, demonstrate the superior performance of the proposed model. It achieves accuracy rates of 81.47 % and 83.44 %, respectively, outperforming most existing methods under the same experimental conditions.</div></div>","PeriodicalId":15867,"journal":{"name":"Journal of Food Composition and Analysis","volume":"148 ","pages":"Article 108395"},"PeriodicalIF":4.6000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Composition and Analysis","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0889157525012116","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

With increasing public emphasis on dietary monitoring and quality of life, fine-grained food image recognition has become an important research area in computer vision. However, distinguishing visually similar food items remains challenging, as traditional classification methods often fail to achieve satisfactory accuracy. To address this, this paper proposes a novel CNN-Transformer-based model that integrates convolutional neural networks (CNNs) with attention mechanisms. Specifically, the model introduces a Global Attention and Local Covariance Convolutional Feature Fusion module into the Swin Transformer framework. This module combines a deep convolutional network, a multi-layer perceptron, and a feature fusion component, enabling better capture of fine-grained details while integrating global context. Extensive experiments conducted on two public fine-grained food image datasets, FoodX-251 and UEC Food-256, demonstrate the superior performance of the proposed model. It achieves accuracy rates of 81.47 % and 83.44 %, respectively, outperforming most existing methods under the same experimental conditions.
基于卷积神经网络和swin变压器混合模型的细粒度食物图像识别
随着人们对饮食监测和生活质量的日益重视,细粒度食物图像识别已成为计算机视觉的一个重要研究领域。然而,区分视觉上相似的食物仍然具有挑战性,因为传统的分类方法往往无法达到令人满意的准确性。为了解决这个问题,本文提出了一种新的基于cnn - transformer的模型,该模型将卷积神经网络(cnn)与注意力机制相结合。具体来说,该模型在Swin Transformer框架中引入了全局关注和局部协方差卷积特征融合模块。该模块结合了深度卷积网络、多层感知器和特征融合组件,能够在集成全局上下文的同时更好地捕获细粒度细节。在两个公开的细粒度食品图像数据集FoodX-251和UEC food -256上进行的大量实验证明了该模型的优越性能。在相同的实验条件下,准确率分别为81.47 %和83.44 %,优于大多数现有的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Food Composition and Analysis
Journal of Food Composition and Analysis 工程技术-食品科技
CiteScore
6.20
自引率
11.60%
发文量
601
审稿时长
53 days
期刊介绍: The Journal of Food Composition and Analysis publishes manuscripts on scientific aspects of data on the chemical composition of human foods, with particular emphasis on actual data on composition of foods; analytical methods; studies on the manipulation, storage, distribution and use of food composition data; and studies on the statistics, use and distribution of such data and data systems. The Journal''s basis is nutrient composition, with increasing emphasis on bioactive non-nutrient and anti-nutrient components. Papers must provide sufficient description of the food samples, analytical methods, quality control procedures and statistical treatments of the data to permit the end users of the food composition data to evaluate the appropriateness of such data in their projects. The Journal does not publish papers on: microbiological compounds; sensory quality; aromatics/volatiles in food and wine; essential oils; organoleptic characteristics of food; physical properties; or clinical papers and pharmacology-related papers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信