From ensemble to knowledge distillation: Improving large-scale food recognition

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-04-02 DOI:10.1016/j.engappai.2025.110727

Liming Nong , Guohao Peng , Tianyang Xu , Jinlin Zhu

{"title":"From ensemble to knowledge distillation: Improving large-scale food recognition","authors":"Liming Nong , Guohao Peng , Tianyang Xu , Jinlin Zhu","doi":"10.1016/j.engappai.2025.110727","DOIUrl":null,"url":null,"abstract":"<div><div>Food recognition on a large scale presents significant challenges due to high intra-category similarity and inter-category variability. Addressing these challenges is crucial for developing robust and accurate food recognition systems, which have applications in health monitoring, dietary assessment, and automated food logging. This study aims to tackle these issues by employing ensemble learning and knowledge distillation. We use ensemble learning to effectively combine the local perception capability of convolutional neural networks (CNNs) and the global modeling capability of Vision Transformers. The synergistic ensemble enhances the model's ability to discern subtle differences within categories and capture a spectrum of diverse patterns across various categories. To reduce the number of base models in an ensemble, we employed a method combining knowledge distillation and re-ensembling. Specifically, we used the collective knowledge of four base models to guide the re-learning process of student models. Subsequently, we re-ensembled these distilled models, significantly enhancing the recognition performance of the ensemble while maintaining the same computational efficiency. Finally, we fine-tuned the optimal ensemble weights to further boost the recognition performance of the ensemble model. We conducted extensive experiments on the large-scale food datasets Food2k and CNFood241, achieving state-of-the-art performance. Specifically, on the Food2k dataset, our method achieved a top-1 accuracy of 86.22 % with 131.56M parameters, outperforming the state-of-the-art algorithms by 2.1 %, demonstrating its effectiveness.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"151 ","pages":"Article 110727"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625007274","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Food recognition on a large scale presents significant challenges due to high intra-category similarity and inter-category variability. Addressing these challenges is crucial for developing robust and accurate food recognition systems, which have applications in health monitoring, dietary assessment, and automated food logging. This study aims to tackle these issues by employing ensemble learning and knowledge distillation. We use ensemble learning to effectively combine the local perception capability of convolutional neural networks (CNNs) and the global modeling capability of Vision Transformers. The synergistic ensemble enhances the model's ability to discern subtle differences within categories and capture a spectrum of diverse patterns across various categories. To reduce the number of base models in an ensemble, we employed a method combining knowledge distillation and re-ensembling. Specifically, we used the collective knowledge of four base models to guide the re-learning process of student models. Subsequently, we re-ensembled these distilled models, significantly enhancing the recognition performance of the ensemble while maintaining the same computational efficiency. Finally, we fine-tuned the optimal ensemble weights to further boost the recognition performance of the ensemble model. We conducted extensive experiments on the large-scale food datasets Food2k and CNFood241, achieving state-of-the-art performance. Specifically, on the Food2k dataset, our method achieved a top-1 accuracy of 86.22 % with 131.56M parameters, outperforming the state-of-the-art algorithms by 2.1 %, demonstrating its effectiveness.

查看原文本刊更多论文

从集成到知识升华：提高大规模食品识别

由于高类别内相似性和类别间可变性，大规模的食物识别面临重大挑战。解决这些挑战对于开发强大而准确的食物识别系统至关重要，这些系统可用于健康监测、饮食评估和自动食物记录。本研究旨在利用集成学习和知识蒸馏来解决这些问题。我们使用集成学习将卷积神经网络（cnn）的局部感知能力和视觉变形器的全局建模能力有效地结合起来。协同集成增强了模型辨别类别内细微差异的能力，并在不同类别中捕获不同模式的频谱。为了减少集成中基本模型的数量，我们采用了知识蒸馏和重新集成相结合的方法。具体来说，我们利用四个基础模型的集体知识来指导学生模型的再学习过程。随后，我们对这些提取出来的模型进行重新集成，在保持相同计算效率的同时显著提高了集成的识别性能。最后，我们对最优集成权值进行了微调，进一步提高了集成模型的识别性能。我们在大型食品数据集Food2k和CNFood241上进行了广泛的实验，取得了最先进的性能。具体来说，在Food2k数据集上，我们的方法在131.56M个参数下达到了86.22%的前1名准确率，比最先进的算法高出2.1%，证明了它的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.