FoodSAM: Any Food Segmentation

ArXiv Pub Date : 2023-08-11 DOI:10.48550/arXiv.2308.05938

Xing Lan, Jiayi Lyu, Han Jiang, Kunkun Dong, Zehai Niu, Yi Zhang, Jian Xue

{"title":"FoodSAM: Any Food Segmentation","authors":"Xing Lan, Jiayi Lyu, Han Jiang, Kunkun Dong, Zehai Niu, Yi Zhang, Jian Xue","doi":"10.48550/arXiv.2308.05938","DOIUrl":null,"url":null,"abstract":"In this paper, we explore the zero-shot capability of the Segment Anything Model (SAM) for food image segmentation. To address the lack of class-specific information in SAM-generated masks, we propose a novel framework, called FoodSAM. This innovative approach integrates the coarse semantic mask with SAM-generated masks to enhance semantic segmentation quality. Besides, we recognize that the ingredients in food can be supposed as independent individuals, which motivated us to perform instance segmentation on food images. Furthermore, FoodSAM extends its zero-shot capability to encompass panoptic segmentation by incorporating an object detector, which renders FoodSAM to effectively capture non-food object information. Drawing inspiration from the recent success of promptable segmentation, we also extend FoodSAM to promptable segmentation, supporting various prompt variants. Consequently, FoodSAM emerges as an all-encompassing solution capable of segmenting food items at multiple levels of granularity. Remarkably, this pioneering framework stands as the first-ever work to achieve instance, panoptic, and promptable segmentation on food images. Extensive experiments demonstrate the feasibility and impressing performance of FoodSAM, validating SAM's potential as a prominent and influential tool within the domain of food image segmentation. We release our code at https://github.com/jamesjg/FoodSAM.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":"282 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2308.05938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we explore the zero-shot capability of the Segment Anything Model (SAM) for food image segmentation. To address the lack of class-specific information in SAM-generated masks, we propose a novel framework, called FoodSAM. This innovative approach integrates the coarse semantic mask with SAM-generated masks to enhance semantic segmentation quality. Besides, we recognize that the ingredients in food can be supposed as independent individuals, which motivated us to perform instance segmentation on food images. Furthermore, FoodSAM extends its zero-shot capability to encompass panoptic segmentation by incorporating an object detector, which renders FoodSAM to effectively capture non-food object information. Drawing inspiration from the recent success of promptable segmentation, we also extend FoodSAM to promptable segmentation, supporting various prompt variants. Consequently, FoodSAM emerges as an all-encompassing solution capable of segmenting food items at multiple levels of granularity. Remarkably, this pioneering framework stands as the first-ever work to achieve instance, panoptic, and promptable segmentation on food images. Extensive experiments demonstrate the feasibility and impressing performance of FoodSAM, validating SAM's potential as a prominent and influential tool within the domain of food image segmentation. We release our code at https://github.com/jamesjg/FoodSAM.

查看原文本刊更多论文

FoodSAM:任何食品细分

在本文中，我们探索了分段任意模型(SAM)在食品图像分割中的零射击能力。为了解决sam生成的掩码中缺乏类特定信息的问题，我们提出了一个新的框架，称为FoodSAM。该方法将粗糙语义掩码与sam生成的掩码相结合，提高了语义分割质量。此外，我们认识到食品中的成分可以看作是独立的个体，这促使我们对食品图像进行实例分割。此外，FoodSAM扩展了其零射击能力，通过结合一个物体检测器来涵盖全视分割，这使得FoodSAM能够有效地捕获非食物物体信息。从最近成功的提示分词中汲取灵感，我们还将FoodSAM扩展到提示分词，支持各种提示变体。因此，FoodSAM成为了一个包容不包的解决方案，能够在多个粒度级别上对食物进行分割。值得注意的是，这个开创性的框架是有史以来第一个实现食物图像的实例、全景和即时分割的工作。大量的实验证明了FoodSAM的可行性和令人印象深刻的性能，验证了SAM在食品图像分割领域作为一个突出和有影响力的工具的潜力。我们在https://github.com/jamesjg/FoodSAM上发布我们的代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量