MSAPVT:用于大规模水果识别的多尺度注意力金字塔视觉转换器网络

IF 2.9 3区 农林科学 Q2 FOOD SCIENCE & TECHNOLOGY
Yao Rao, Chaofeng Li, Feiran Xu, Ya Guo
{"title":"MSAPVT:用于大规模水果识别的多尺度注意力金字塔视觉转换器网络","authors":"Yao Rao,&nbsp;Chaofeng Li,&nbsp;Feiran Xu,&nbsp;Ya Guo","doi":"10.1007/s11694-024-02874-3","DOIUrl":null,"url":null,"abstract":"<div><p>Efficient and accurate fruit recognition is critical for applications such as automated fruit-picking systems, quality evaluation, and self-checkout services in supermarkets. Existing vision-based methods, primarily leveraging Convolutional Neural Networks (CNNs), often achieve high performance but are hindered by high computational complexity, making real-time deployment on edge devices challenging. Moreover, the diversity and similarity among fruit varieties, along with imbalanced fruit datasets, pose significant obstacles to general-purpose deep learning algorithms. To address these challenges, we propose the Multi-Scale Attention Pyramid Vision Transformer (MSAPVT) alongside an enhanced version of the Fru92 dataset. Our MSAPVT introduces four innovative improvements: attention enhancement, dimension adjustment, multi-scale feature aggregation and loss function improvement. Firstly, the Hybrid Attention Module (HAM) is designed for better refining the multi-level features of the Pyramid Vision Transformer v2 (PVTv2). Secondly, the Dimension Adjustment Layer (DAL) is designed for increasing the weight of the high-level features. Thirdly, the multi-scale feature aggregation strategy is introduced to fuse multi-scale complementary features. Finally, the KL-divergence loss is added for enhancing the difference between multi-scale features. These innovations enable MSAPVT to capture fine-grained details in fruit images, generating highly discriminative representations with slight low model complexity. Our model achieves the best results on the Fru92 and Fru92s datasets, with Top-1 Acc. of 91.40% and 94.29%, and Top-5 Acc. of 98.95% and 99.55%, respectively. In the end, an approachable and efficient fruit classification system based on MSAPVT is devised for potential applications. The improved dataset is available at https://github.com/iamraoyao/MSAPVT-Inference-Demo.</p></div>","PeriodicalId":631,"journal":{"name":"Journal of Food Measurement and Characterization","volume":"18 11","pages":"9233 - 9251"},"PeriodicalIF":2.9000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition\",\"authors\":\"Yao Rao,&nbsp;Chaofeng Li,&nbsp;Feiran Xu,&nbsp;Ya Guo\",\"doi\":\"10.1007/s11694-024-02874-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Efficient and accurate fruit recognition is critical for applications such as automated fruit-picking systems, quality evaluation, and self-checkout services in supermarkets. Existing vision-based methods, primarily leveraging Convolutional Neural Networks (CNNs), often achieve high performance but are hindered by high computational complexity, making real-time deployment on edge devices challenging. Moreover, the diversity and similarity among fruit varieties, along with imbalanced fruit datasets, pose significant obstacles to general-purpose deep learning algorithms. To address these challenges, we propose the Multi-Scale Attention Pyramid Vision Transformer (MSAPVT) alongside an enhanced version of the Fru92 dataset. Our MSAPVT introduces four innovative improvements: attention enhancement, dimension adjustment, multi-scale feature aggregation and loss function improvement. Firstly, the Hybrid Attention Module (HAM) is designed for better refining the multi-level features of the Pyramid Vision Transformer v2 (PVTv2). Secondly, the Dimension Adjustment Layer (DAL) is designed for increasing the weight of the high-level features. Thirdly, the multi-scale feature aggregation strategy is introduced to fuse multi-scale complementary features. Finally, the KL-divergence loss is added for enhancing the difference between multi-scale features. These innovations enable MSAPVT to capture fine-grained details in fruit images, generating highly discriminative representations with slight low model complexity. Our model achieves the best results on the Fru92 and Fru92s datasets, with Top-1 Acc. of 91.40% and 94.29%, and Top-5 Acc. of 98.95% and 99.55%, respectively. In the end, an approachable and efficient fruit classification system based on MSAPVT is devised for potential applications. The improved dataset is available at https://github.com/iamraoyao/MSAPVT-Inference-Demo.</p></div>\",\"PeriodicalId\":631,\"journal\":{\"name\":\"Journal of Food Measurement and Characterization\",\"volume\":\"18 11\",\"pages\":\"9233 - 9251\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Food Measurement and Characterization\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s11694-024-02874-3\",\"RegionNum\":3,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"FOOD SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Measurement and Characterization","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s11694-024-02874-3","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

高效准确的水果识别对于自动水果采摘系统、质量评估和超市自助结账服务等应用至关重要。现有的基于视觉的方法,主要是利用卷积神经网络(CNN),通常能达到很高的性能,但受到计算复杂度高的阻碍,在边缘设备上进行实时部署具有挑战性。此外,水果品种之间的多样性和相似性,以及不平衡的水果数据集,都对通用深度学习算法构成了重大障碍。为了应对这些挑战,我们提出了多尺度注意力金字塔视觉转换器(MSAPVT)以及增强版的 Fru92 数据集。我们的 MSAPVT 引入了四项创新改进:注意力增强、维度调整、多尺度特征聚合和损失函数改进。首先,混合注意力模块(HAM)旨在更好地完善金字塔视觉转换器 v2(PVTv2)的多级特征。其次,设计了维度调整层(DAL),以增加高级特征的权重。第三,引入多尺度特征聚合策略,以融合多尺度互补特征。最后,增加了 KL-发散损失,以增强多尺度特征之间的差异。这些创新使 MSAPVT 能够捕捉到水果图像中的细微细节,生成高辨别度的表征,同时模型复杂度很低。我们的模型在 Fru92 和 Fru92s 数据集上取得了最佳结果,Top-1 Acc.最终,一个基于 MSAPVT 的平易近人且高效的水果分类系统被设计出来,并得到了潜在的应用。改进后的数据集可在 https://github.com/iamraoyao/MSAPVT-Inference-Demo 网站上查阅。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition

Efficient and accurate fruit recognition is critical for applications such as automated fruit-picking systems, quality evaluation, and self-checkout services in supermarkets. Existing vision-based methods, primarily leveraging Convolutional Neural Networks (CNNs), often achieve high performance but are hindered by high computational complexity, making real-time deployment on edge devices challenging. Moreover, the diversity and similarity among fruit varieties, along with imbalanced fruit datasets, pose significant obstacles to general-purpose deep learning algorithms. To address these challenges, we propose the Multi-Scale Attention Pyramid Vision Transformer (MSAPVT) alongside an enhanced version of the Fru92 dataset. Our MSAPVT introduces four innovative improvements: attention enhancement, dimension adjustment, multi-scale feature aggregation and loss function improvement. Firstly, the Hybrid Attention Module (HAM) is designed for better refining the multi-level features of the Pyramid Vision Transformer v2 (PVTv2). Secondly, the Dimension Adjustment Layer (DAL) is designed for increasing the weight of the high-level features. Thirdly, the multi-scale feature aggregation strategy is introduced to fuse multi-scale complementary features. Finally, the KL-divergence loss is added for enhancing the difference between multi-scale features. These innovations enable MSAPVT to capture fine-grained details in fruit images, generating highly discriminative representations with slight low model complexity. Our model achieves the best results on the Fru92 and Fru92s datasets, with Top-1 Acc. of 91.40% and 94.29%, and Top-5 Acc. of 98.95% and 99.55%, respectively. In the end, an approachable and efficient fruit classification system based on MSAPVT is devised for potential applications. The improved dataset is available at https://github.com/iamraoyao/MSAPVT-Inference-Demo.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Food Measurement and Characterization
Journal of Food Measurement and Characterization Agricultural and Biological Sciences-Food Science
CiteScore
6.00
自引率
11.80%
发文量
425
期刊介绍: This interdisciplinary journal publishes new measurement results, characteristic properties, differentiating patterns, measurement methods and procedures for such purposes as food process innovation, product development, quality control, and safety assurance. The journal encompasses all topics related to food property measurement and characterization, including all types of measured properties of food and food materials, features and patterns, measurement principles and techniques, development and evaluation of technologies, novel uses and applications, and industrial implementation of systems and procedures.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信