Informative Scene Graph Generation via Debiasing

IF 9.3 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision Pub Date : 2025-02-24 DOI:10.1007/s11263-025-02365-y

Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng Tao Shen, Jingkuan Song

{"title":"Informative Scene Graph Generation via Debiasing","authors":"Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng Tao Shen, Jingkuan Song","doi":"10.1007/s11263-025-02365-y","DOIUrl":null,"url":null,"abstract":"Scene graph generation aims to detect visual relationship triplets, (subject, predicate, object). Due to biases in data, current models tend to predict common predicates, e.g., “on” and “at”, instead of informative ones, e.g., “standing on” and “looking at”. This tendency results in the loss of precise information and overall performance. If a model only uses “stone on road” rather than “stone blocking road” to describe an image, it may be a grave misunderstanding. We argue that this phenomenon is caused by two imbalances: semantic space level imbalance and training sample level imbalance. For this problem, we propose DB-SGG, an effective framework based on debiasing but not the conventional distribution fitting. It integrates two components: Semantic Debiasing (SD) and Balanced Predicate Learning (BPL), for these imbalances. SD utilizes a confusion matrix and a bipartite graph to construct predicate relationships. BPL adopts a random undersampling strategy and an ambiguity removing strategy to focus on informative predicates. Benefiting from the model-agnostic process, our method can be easily applied to SGG models and outperforms Transformer by \\(136.3\\%\\), \\(119.5\\%\\), and \\(122.6\\%\\) on mR@20 at three SGG sub-tasks on the SGG-VG dataset. Our method is further verified on another complex SGG dataset (SGG-GQA) and two downstream tasks (sentence-to-graph retrieval and image captioning).","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"3 1","pages":""},"PeriodicalIF":9.3000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-025-02365-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Scene graph generation aims to detect visual relationship triplets, (subject, predicate, object). Due to biases in data, current models tend to predict common predicates, e.g., “on” and “at”, instead of informative ones, e.g., “standing on” and “looking at”. This tendency results in the loss of precise information and overall performance. If a model only uses “stone on road” rather than “stone blocking road” to describe an image, it may be a grave misunderstanding. We argue that this phenomenon is caused by two imbalances: semantic space level imbalance and training sample level imbalance. For this problem, we propose DB-SGG, an effective framework based on debiasing but not the conventional distribution fitting. It integrates two components: Semantic Debiasing (SD) and Balanced Predicate Learning (BPL), for these imbalances. SD utilizes a confusion matrix and a bipartite graph to construct predicate relationships. BPL adopts a random undersampling strategy and an ambiguity removing strategy to focus on informative predicates. Benefiting from the model-agnostic process, our method can be easily applied to SGG models and outperforms Transformer by \(136.3\%\), \(119.5\%\), and \(122.6\%\) on mR@20 at three SGG sub-tasks on the SGG-VG dataset. Our method is further verified on another complex SGG dataset (SGG-GQA) and two downstream tasks (sentence-to-graph retrieval and image captioning).

查看原文本刊更多论文

通过去偏生成信息场景图

场景图生成的目的是检测视觉关系三元组（主语、谓语、宾语）。由于数据的偏差，目前的模型倾向于预测常见的谓词，例如“on”和“at”，而不是信息性的谓词，例如“standing on”和“looking at”。这种趋势会导致精确信息和整体性能的丢失。如果一个模型只使用“石头在路上”而不是“石头堵路”来描述一个图像，这可能是一个严重的误解。我们认为这种现象是由语义空间层次的不平衡和训练样本层次的不平衡造成的。针对这个问题，我们提出了DB-SGG，这是一种基于去偏而不是传统分布拟合的有效框架。它集成了两个组件：语义去偏（SD）和平衡谓词学习（BPL），以解决这些不平衡。SD利用混淆矩阵和二部图来构造谓词关系。BPL采用随机欠采样策略和去歧义策略，重点关注信息性谓词。得益于模型不可知的过程，我们的方法可以很容易地应用于SGG模型，并且在SGG- vg数据集上的三个SGG子任务上，我们的方法优于Transformer \(136.3\%\)、\(119.5\%\)和\(122.6\%\) （mR@20）。我们的方法在另一个复杂的SGG数据集（SGG- gqa）和两个下游任务（句子到图检索和图像字幕）上进一步验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Vision 工程技术-计算机：人工智能

CiteScore

29.80

自引率

2.10%

发文量

163

审稿时长

6 months

期刊介绍： The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.