{"title":"Multi-task visual food recognition by integrating an ontology supported with LLM","authors":"Daniel Ponte , Eduardo Aguilar , Mireia Ribera , Petia Radeva","doi":"10.1016/j.jvcir.2025.104484","DOIUrl":null,"url":null,"abstract":"<div><div>Food image analysis is a crucial task with far-reaching implications across various domains, including culinary arts, nutrition, and food technology. This paper presents a novel approach to multi-task visual food analysis, using large language models to obtain recipes and support the creation of a comprehensive food ontology. The approach integrates the food ontology into an end-to-end model, with prior knowledge on the relationships of food concepts at different semantic levels, within a multi-task deep learning visual food analysis approach, to generate better and more consistent class predictions. Evaluated on two benchmark datasets, MAFood-121 and VireoFood-172, this method demonstrates its effectiveness in single-label food recognition and multi-label food group classification. The ontology enhances accuracy, consistency, and generalization by effectively transferring knowledge to the learning model. This study underscores the potential of ontology-based methods to address food image classification complexities, with implications for broad applications, including automated recipe generation and nutritional assessment.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104484"},"PeriodicalIF":2.6000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325000987","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Food image analysis is a crucial task with far-reaching implications across various domains, including culinary arts, nutrition, and food technology. This paper presents a novel approach to multi-task visual food analysis, using large language models to obtain recipes and support the creation of a comprehensive food ontology. The approach integrates the food ontology into an end-to-end model, with prior knowledge on the relationships of food concepts at different semantic levels, within a multi-task deep learning visual food analysis approach, to generate better and more consistent class predictions. Evaluated on two benchmark datasets, MAFood-121 and VireoFood-172, this method demonstrates its effectiveness in single-label food recognition and multi-label food group classification. The ontology enhances accuracy, consistency, and generalization by effectively transferring knowledge to the learning model. This study underscores the potential of ontology-based methods to address food image classification complexities, with implications for broad applications, including automated recipe generation and nutritional assessment.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.