Prototype-based multi-domain self-distillation for unbiased scene graph generation

IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yuan Gao , Yaochen Li , Yujie Zang , Jingze Liu , Yuehu Liu
{"title":"Prototype-based multi-domain self-distillation for unbiased scene graph generation","authors":"Yuan Gao ,&nbsp;Yaochen Li ,&nbsp;Yujie Zang ,&nbsp;Jingze Liu ,&nbsp;Yuehu Liu","doi":"10.1016/j.neucom.2025.131625","DOIUrl":null,"url":null,"abstract":"<div><div>Scene Graph Generation (SGG) plays an important role in reinforcing visual image understanding. Existing methods often encounter difficulties in effectively representing implicit relationship features, which limits their capacity to distinguish between predicates. Meanwhile, these approaches are susceptible to imbalanced instance distributions, hindering the efficient training of fine-grained predicates. To address these problems, we propose a novel prototype-based multi-domain self-distillation training framework. Specifically, a Multi-Domain Fusion (MDF) module is introduced to improve predicate feature representation by integrating global contextual information and local spatial-frequency domain information. Then, a Prototype Generation Network (PGN) is designed for building the class prototypes, which consists of the design of different granularity predicates and loss functions. Furthermore, we design two different data balancing strategies under the guidance of class prototypes, which correspond to mining the in-distribution and out-of-distribution information of the original data, respectively. The experimental results demonstrate that the proposed method is superior to the existing methods on VG, GQA and Open Images V6 datasets, which makes it more applicable to generating unbiased scene graph models.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131625"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225022970","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Scene Graph Generation (SGG) plays an important role in reinforcing visual image understanding. Existing methods often encounter difficulties in effectively representing implicit relationship features, which limits their capacity to distinguish between predicates. Meanwhile, these approaches are susceptible to imbalanced instance distributions, hindering the efficient training of fine-grained predicates. To address these problems, we propose a novel prototype-based multi-domain self-distillation training framework. Specifically, a Multi-Domain Fusion (MDF) module is introduced to improve predicate feature representation by integrating global contextual information and local spatial-frequency domain information. Then, a Prototype Generation Network (PGN) is designed for building the class prototypes, which consists of the design of different granularity predicates and loss functions. Furthermore, we design two different data balancing strategies under the guidance of class prototypes, which correspond to mining the in-distribution and out-of-distribution information of the original data, respectively. The experimental results demonstrate that the proposed method is superior to the existing methods on VG, GQA and Open Images V6 datasets, which makes it more applicable to generating unbiased scene graph models.
基于原型的多域自蒸馏无偏场景图生成
场景图生成(Scene Graph Generation, SGG)在增强视觉图像理解方面起着重要作用。现有的方法在有效表示隐式关系特征时经常遇到困难,这限制了它们区分谓词的能力。同时,这些方法容易受到实例分布不平衡的影响,阻碍了细粒度谓词的有效训练。为了解决这些问题,我们提出了一种新的基于原型的多域自蒸馏训练框架。具体来说,引入了多域融合(MDF)模块,通过整合全局上下文信息和局部空频域信息来改进谓词特征表示。然后,设计了用于构建类原型的原型生成网络(PGN),该网络包括不同粒度谓词和损失函数的设计;在类原型的指导下,设计了两种不同的数据平衡策略,分别对应于挖掘原始数据的分布内信息和分布外信息。实验结果表明,该方法在VG、GQA和Open Images V6数据集上优于现有方法,更适用于生成无偏场景图模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信