合成零射击学习的多级上下文原型调制

IF 13.7
Yang Liu;Xinshuo Wang;Xinbo Gao;Jungong Han;Ling Shao
{"title":"合成零射击学习的多级上下文原型调制","authors":"Yang Liu;Xinshuo Wang;Xinbo Gao;Jungong Han;Ling Shao","doi":"10.1109/TIP.2025.3592560","DOIUrl":null,"url":null,"abstract":"Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object compositions by leveraging prior knowledge of known primitives. However, real-world visual features of attributes and objects are often entangled, causing distribution shifts between seen and unseen combinations. Existing methods often ignore intrinsic variations and interactions among primitives, leading to poor feature discrimination and biased predictions. To address these challenges, we propose Multi-level Contextual Prototype Modulation (MCPM), a transformer-based framework with a hierarchical structure that effectively integrates attributes and objects to generate richer visual embeddings. At the feature level, we apply contrastive learning to improve discriminability across compositional tasks. At the prototype level, a subclass-driven modulator captures fine-grained attribute-object interactions, enabling better adaptation to long-tail distributions. Additionally, we introduce a Minority Attribute Enhancement (MAE) strategy that synthesizes virtual samples by mixing attribute classes, further mitigating data imbalance. Experiments on four benchmark datasets (MIT-States, C-GQA, UT-Zappos, and VAW-CZSL) show that MCPM brings significant performance improvements, verifying its effectiveness in complex composition scenes.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4856-4868"},"PeriodicalIF":13.7000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Level Contextual Prototype Modulation for Compositional Zero-Shot Learning\",\"authors\":\"Yang Liu;Xinshuo Wang;Xinbo Gao;Jungong Han;Ling Shao\",\"doi\":\"10.1109/TIP.2025.3592560\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object compositions by leveraging prior knowledge of known primitives. However, real-world visual features of attributes and objects are often entangled, causing distribution shifts between seen and unseen combinations. Existing methods often ignore intrinsic variations and interactions among primitives, leading to poor feature discrimination and biased predictions. To address these challenges, we propose Multi-level Contextual Prototype Modulation (MCPM), a transformer-based framework with a hierarchical structure that effectively integrates attributes and objects to generate richer visual embeddings. At the feature level, we apply contrastive learning to improve discriminability across compositional tasks. At the prototype level, a subclass-driven modulator captures fine-grained attribute-object interactions, enabling better adaptation to long-tail distributions. Additionally, we introduce a Minority Attribute Enhancement (MAE) strategy that synthesizes virtual samples by mixing attribute classes, further mitigating data imbalance. Experiments on four benchmark datasets (MIT-States, C-GQA, UT-Zappos, and VAW-CZSL) show that MCPM brings significant performance improvements, verifying its effectiveness in complex composition scenes.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"4856-4868\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11104968/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11104968/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

组合零射击学习(CZSL)旨在通过利用已知原语的先验知识来识别看不见的属性-对象组合。然而,现实世界中属性和对象的视觉特征经常纠缠在一起,导致可见和不可见组合之间的分布变化。现有的方法往往忽略了原语之间的内在变化和相互作用,导致特征识别能力差,预测有偏差。为了应对这些挑战,我们提出了多级上下文原型调制(MCPM),这是一种基于变压器的框架,具有层次结构,可以有效地集成属性和对象,以生成更丰富的视觉嵌入。在特征层面,我们应用对比学习来提高组合任务之间的区别性。在原型级别,子类驱动的调制器捕获细粒度的属性-对象交互,从而能够更好地适应长尾分布。此外,我们还引入了一种少数属性增强(MAE)策略,该策略通过混合属性类来合成虚拟样本,进一步减轻了数据的不平衡。在MIT-States、C-GQA、UT-Zappos和VAW-CZSL四个基准数据集上的实验表明,MCPM带来了显著的性能提升,验证了其在复杂合成场景下的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-Level Contextual Prototype Modulation for Compositional Zero-Shot Learning
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object compositions by leveraging prior knowledge of known primitives. However, real-world visual features of attributes and objects are often entangled, causing distribution shifts between seen and unseen combinations. Existing methods often ignore intrinsic variations and interactions among primitives, leading to poor feature discrimination and biased predictions. To address these challenges, we propose Multi-level Contextual Prototype Modulation (MCPM), a transformer-based framework with a hierarchical structure that effectively integrates attributes and objects to generate richer visual embeddings. At the feature level, we apply contrastive learning to improve discriminability across compositional tasks. At the prototype level, a subclass-driven modulator captures fine-grained attribute-object interactions, enabling better adaptation to long-tail distributions. Additionally, we introduce a Minority Attribute Enhancement (MAE) strategy that synthesizes virtual samples by mixing attribute classes, further mitigating data imbalance. Experiments on four benchmark datasets (MIT-States, C-GQA, UT-Zappos, and VAW-CZSL) show that MCPM brings significant performance improvements, verifying its effectiveness in complex composition scenes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信