PartSeg: Few-shot part segmentation via part-aware prompt learning

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Mengya Han , Heliang Zheng , Chaoyue Wang , Yong Luo , Han Hu , Jing Zhang , Bo Du
{"title":"PartSeg: Few-shot part segmentation via part-aware prompt learning","authors":"Mengya Han ,&nbsp;Heliang Zheng ,&nbsp;Chaoyue Wang ,&nbsp;Yong Luo ,&nbsp;Han Hu ,&nbsp;Jing Zhang ,&nbsp;Bo Du","doi":"10.1016/j.patcog.2024.111326","DOIUrl":null,"url":null,"abstract":"<div><div>In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples. It has been found that leveraging the textual space of a powerful pre-trained image-language model, such as CLIP, can substantially enhance the learning of visual features in few-shot tasks. However, CLIP-based methods primarily focus on high-level visual features that are fully aligned with textual features representing the “summary” of the image, which often struggle to understand the concept of object parts through textual descriptions. To address this, we propose PartSeg, a novel method that learns part-aware prompts to grasp the concept of “part” and better utilize the textual space of CLIP to enhance few-shot part segmentation. Specifically, we design a part-aware prompt learning module that generates part-aware prompts, enabling the CLIP model to better understand the concept of “part” and effectively utilize its textual space. The part-aware prompt learning module includes a part-specific prompt generator that produces part-specific tokens for each part class. Furthermore, since the concept of the same part across different object categories is general, we establish relationships between these parts to estimate part-shared tokens during the prompt learning process. Finally, the part-specific and part-shared tokens, along with the textual tokens encoded from textual descriptions of parts (i.e., part labels), are combined to form the part-aware prompt used to generate textual prototypes for segmentation. We conduct extensive experiments on the PartImageNet and Pascal_Part datasets, and the results demonstrate that our proposed method achieves state-of-the-art performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111326"},"PeriodicalIF":7.5000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032401077X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples. It has been found that leveraging the textual space of a powerful pre-trained image-language model, such as CLIP, can substantially enhance the learning of visual features in few-shot tasks. However, CLIP-based methods primarily focus on high-level visual features that are fully aligned with textual features representing the “summary” of the image, which often struggle to understand the concept of object parts through textual descriptions. To address this, we propose PartSeg, a novel method that learns part-aware prompts to grasp the concept of “part” and better utilize the textual space of CLIP to enhance few-shot part segmentation. Specifically, we design a part-aware prompt learning module that generates part-aware prompts, enabling the CLIP model to better understand the concept of “part” and effectively utilize its textual space. The part-aware prompt learning module includes a part-specific prompt generator that produces part-specific tokens for each part class. Furthermore, since the concept of the same part across different object categories is general, we establish relationships between these parts to estimate part-shared tokens during the prompt learning process. Finally, the part-specific and part-shared tokens, along with the textual tokens encoded from textual descriptions of parts (i.e., part labels), are combined to form the part-aware prompt used to generate textual prototypes for segmentation. We conduct extensive experiments on the PartImageNet and Pascal_Part datasets, and the results demonstrate that our proposed method achieves state-of-the-art performance.

Abstract Image

求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信