Prototype-based multi-domain self-distillation for unbiased scene graph generation

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-09-30 DOI:10.1016/j.neucom.2025.131625

Yuan Gao , Yaochen Li , Yujie Zang , Jingze Liu , Yuehu Liu

{"title":"Prototype-based multi-domain self-distillation for unbiased scene graph generation","authors":"Yuan Gao , Yaochen Li , Yujie Zang , Jingze Liu , Yuehu Liu","doi":"10.1016/j.neucom.2025.131625","DOIUrl":null,"url":null,"abstract":"<div><div>Scene Graph Generation (SGG) plays an important role in reinforcing visual image understanding. Existing methods often encounter difficulties in effectively representing implicit relationship features, which limits their capacity to distinguish between predicates. Meanwhile, these approaches are susceptible to imbalanced instance distributions, hindering the efficient training of fine-grained predicates. To address these problems, we propose a novel prototype-based multi-domain self-distillation training framework. Specifically, a Multi-Domain Fusion (MDF) module is introduced to improve predicate feature representation by integrating global contextual information and local spatial-frequency domain information. Then, a Prototype Generation Network (PGN) is designed for building the class prototypes, which consists of the design of different granularity predicates and loss functions. Furthermore, we design two different data balancing strategies under the guidance of class prototypes, which correspond to mining the in-distribution and out-of-distribution information of the original data, respectively. The experimental results demonstrate that the proposed method is superior to the existing methods on VG, GQA and Open Images V6 datasets, which makes it more applicable to generating unbiased scene graph models.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131625"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225022970","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Scene Graph Generation (SGG) plays an important role in reinforcing visual image understanding. Existing methods often encounter difficulties in effectively representing implicit relationship features, which limits their capacity to distinguish between predicates. Meanwhile, these approaches are susceptible to imbalanced instance distributions, hindering the efficient training of fine-grained predicates. To address these problems, we propose a novel prototype-based multi-domain self-distillation training framework. Specifically, a Multi-Domain Fusion (MDF) module is introduced to improve predicate feature representation by integrating global contextual information and local spatial-frequency domain information. Then, a Prototype Generation Network (PGN) is designed for building the class prototypes, which consists of the design of different granularity predicates and loss functions. Furthermore, we design two different data balancing strategies under the guidance of class prototypes, which correspond to mining the in-distribution and out-of-distribution information of the original data, respectively. The experimental results demonstrate that the proposed method is superior to the existing methods on VG, GQA and Open Images V6 datasets, which makes it more applicable to generating unbiased scene graph models.

查看原文本刊更多论文

基于原型的多域自蒸馏无偏场景图生成

场景图生成（Scene Graph Generation， SGG）在增强视觉图像理解方面起着重要作用。现有的方法在有效表示隐式关系特征时经常遇到困难，这限制了它们区分谓词的能力。同时，这些方法容易受到实例分布不平衡的影响，阻碍了细粒度谓词的有效训练。为了解决这些问题，我们提出了一种新的基于原型的多域自蒸馏训练框架。具体来说，引入了多域融合（MDF）模块，通过整合全局上下文信息和局部空频域信息来改进谓词特征表示。然后，设计了用于构建类原型的原型生成网络（PGN），该网络包括不同粒度谓词和损失函数的设计；在类原型的指导下，设计了两种不同的数据平衡策略，分别对应于挖掘原始数据的分布内信息和分布外信息。实验结果表明，该方法在VG、GQA和Open Images V6数据集上优于现有方法，更适用于生成无偏场景图模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.