基于gnn的成分零射击学习原语重组

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Fuqin Deng , Caiyun Tang , Lanhui Fu , Wei Jin , Jiaming Zhong , Hongming Wang , Nannan Li
{"title":"基于gnn的成分零射击学习原语重组","authors":"Fuqin Deng ,&nbsp;Caiyun Tang ,&nbsp;Lanhui Fu ,&nbsp;Wei Jin ,&nbsp;Jiaming Zhong ,&nbsp;Hongming Wang ,&nbsp;Nannan Li","doi":"10.1016/j.imavis.2025.105762","DOIUrl":null,"url":null,"abstract":"<div><div>Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute–object combinations, with the core challenge being the complex visual manifestations across compositions. We posit that the key to address this challenge lies in enabling models to simulate human recognition processes by decomposing and dynamically recombining primitives (attributes and objects). Existing methods merely concatenate primitives after extraction to form new combinations, without achieving deep integration between attributes and objects to create truly novel compositions. To address this issue, we propose Graph Neural Network-based Primitive Recombination (GPR) framework. This framework innovatively designs a Primitive Recombination Module (PRM) based on the Compositional Matching Module (CMM). Specifically, we first extract primitives, and build independent attribute and object space based on the CLIP model, enabling more precise learning of primitive-level visual features and reducing information residuals. Additionally, we introduce a Virtual Composition Unit (VCU), which inputs optimized primitive features as nodes into GNN and models complex interaction relationships between attributes and objects through message propagation. The module performs mean pooling on the updated node features to obtain a recombined representation and fuses the global visual information from the original image through residual connections, generating semantically rich virtual compositional features while preserving key visual cues. We conduct extensive experiments on three CZSL benchmark datasets to show that GPR achieves state-of-the-art or competitive performance in both closed-world and open-world settings.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"163 ","pages":"Article 105762"},"PeriodicalIF":4.2000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GNN-based primitive recombination for compositional zero-shot learning\",\"authors\":\"Fuqin Deng ,&nbsp;Caiyun Tang ,&nbsp;Lanhui Fu ,&nbsp;Wei Jin ,&nbsp;Jiaming Zhong ,&nbsp;Hongming Wang ,&nbsp;Nannan Li\",\"doi\":\"10.1016/j.imavis.2025.105762\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute–object combinations, with the core challenge being the complex visual manifestations across compositions. We posit that the key to address this challenge lies in enabling models to simulate human recognition processes by decomposing and dynamically recombining primitives (attributes and objects). Existing methods merely concatenate primitives after extraction to form new combinations, without achieving deep integration between attributes and objects to create truly novel compositions. To address this issue, we propose Graph Neural Network-based Primitive Recombination (GPR) framework. This framework innovatively designs a Primitive Recombination Module (PRM) based on the Compositional Matching Module (CMM). Specifically, we first extract primitives, and build independent attribute and object space based on the CLIP model, enabling more precise learning of primitive-level visual features and reducing information residuals. Additionally, we introduce a Virtual Composition Unit (VCU), which inputs optimized primitive features as nodes into GNN and models complex interaction relationships between attributes and objects through message propagation. The module performs mean pooling on the updated node features to obtain a recombined representation and fuses the global visual information from the original image through residual connections, generating semantically rich virtual compositional features while preserving key visual cues. We conduct extensive experiments on three CZSL benchmark datasets to show that GPR achieves state-of-the-art or competitive performance in both closed-world and open-world settings.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"163 \",\"pages\":\"Article 105762\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625003506\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625003506","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

composition Zero-Shot Learning (CZSL)旨在识别看不见的属性-对象组合,其核心挑战是跨组合的复杂视觉表现。我们认为解决这一挑战的关键在于使模型能够通过分解和动态重组原语(属性和对象)来模拟人类识别过程。现有的方法只是将提取后的原语连接起来,形成新的组合,而没有实现属性和对象之间的深度集成,以创建真正新颖的组合。为了解决这个问题,我们提出了基于图神经网络的原语重组(GPR)框架。该框架在组合匹配模块(CMM)的基础上,创新地设计了一个原语重组模块(PRM)。具体而言,我们首先提取原语,并基于CLIP模型构建独立的属性和对象空间,从而更精确地学习原语级视觉特征,减少信息残差。此外,我们还引入了虚拟组合单元(VCU),该单元将优化的原始特征作为节点输入到GNN中,并通过消息传播对属性和对象之间复杂的交互关系进行建模。该模块对更新后的节点特征进行均值池化,得到重组后的表示,并通过残差连接融合原始图像的全局视觉信息,生成语义丰富的虚拟构成特征,同时保留关键的视觉线索。我们在三个CZSL基准数据集上进行了广泛的实验,以表明GPR在封闭世界和开放世界设置中都达到了最先进或具有竞争力的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

GNN-based primitive recombination for compositional zero-shot learning

GNN-based primitive recombination for compositional zero-shot learning
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute–object combinations, with the core challenge being the complex visual manifestations across compositions. We posit that the key to address this challenge lies in enabling models to simulate human recognition processes by decomposing and dynamically recombining primitives (attributes and objects). Existing methods merely concatenate primitives after extraction to form new combinations, without achieving deep integration between attributes and objects to create truly novel compositions. To address this issue, we propose Graph Neural Network-based Primitive Recombination (GPR) framework. This framework innovatively designs a Primitive Recombination Module (PRM) based on the Compositional Matching Module (CMM). Specifically, we first extract primitives, and build independent attribute and object space based on the CLIP model, enabling more precise learning of primitive-level visual features and reducing information residuals. Additionally, we introduce a Virtual Composition Unit (VCU), which inputs optimized primitive features as nodes into GNN and models complex interaction relationships between attributes and objects through message propagation. The module performs mean pooling on the updated node features to obtain a recombined representation and fuses the global visual information from the original image through residual connections, generating semantically rich virtual compositional features while preserving key visual cues. We conduct extensive experiments on three CZSL benchmark datasets to show that GPR achieves state-of-the-art or competitive performance in both closed-world and open-world settings.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信