Unbiased scene graph generation via head-tail cooperative network with self-supervised learning

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2024-09-22 DOI:10.1016/j.imavis.2024.105283

{"title":"Unbiased scene graph generation via head-tail cooperative network with self-supervised learning","authors":"","doi":"10.1016/j.imavis.2024.105283","DOIUrl":null,"url":null,"abstract":"<div><div>Scene Graph Generation (SGG) as a critical task in image understanding, facing the challenge of head-biased prediction caused by the long-tail distribution of predicates. However, current debiased SGG methods can easily prioritize improving the prediction of tail predicates while ignoring the substantial sacrifice of head predicates, leading to a shift from head bias to tail bias. To address this issue, we propose a Head-Tail Cooperative network with self-supervised Learning (HTCL), which achieves unbiased SGG by cooperating head-prefer and tail-prefer predictions through learnable weight parameters. HTCL employs a tail-prefer feature encoder to re-represent predicate features by injecting self-supervised learning, which focuses on the intrinsic structure of features, into the supervised learning of SGG, constraining the representation of predicate features to enhance the distinguishability of tail samples. We demonstrate the effectiveness of our HTCL by applying it to VG150, Open Images V6 and GQA200 datasets. The results show that HTCL achieves higher mean Recall with a minimal sacrifice in Recall and achieves a new state-of-the-art overall performance. Our code is available at <span><span>https://github.com/wanglei0618/HTCL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003883","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Scene Graph Generation (SGG) as a critical task in image understanding, facing the challenge of head-biased prediction caused by the long-tail distribution of predicates. However, current debiased SGG methods can easily prioritize improving the prediction of tail predicates while ignoring the substantial sacrifice of head predicates, leading to a shift from head bias to tail bias. To address this issue, we propose a Head-Tail Cooperative network with self-supervised Learning (HTCL), which achieves unbiased SGG by cooperating head-prefer and tail-prefer predictions through learnable weight parameters. HTCL employs a tail-prefer feature encoder to re-represent predicate features by injecting self-supervised learning, which focuses on the intrinsic structure of features, into the supervised learning of SGG, constraining the representation of predicate features to enhance the distinguishability of tail samples. We demonstrate the effectiveness of our HTCL by applying it to VG150, Open Images V6 and GQA200 datasets. The results show that HTCL achieves higher mean Recall with a minimal sacrifice in Recall and achieves a new state-of-the-art overall performance. Our code is available at https://github.com/wanglei0618/HTCL.

查看原文本刊更多论文

通过具有自我监督学习功能的头尾协同网络生成无偏差场景图

场景图生成（SGG）作为图像理解中的一项关键任务，面临着谓词长尾分布导致的头部偏差预测的挑战。然而，目前的去偏 SGG 方法很容易优先改善尾部谓词的预测，而忽视头部谓词的巨大牺牲，导致从头部偏向尾部偏向。为了解决这个问题，我们提出了一种具有自我监督学习（self-supervised Learning，HTCL）功能的头尾合作网络（Head-Tail Cooperative network），它通过可学习的权重参数将头部偏好预测和尾部偏好预测结合起来，从而实现无偏 SGG。HTCL 采用尾部偏好特征编码器来重新表示谓词特征，将注重特征内在结构的自我监督学习注入 SGG 的监督学习中，限制谓词特征的表示，以提高尾部样本的可区分性。我们将 HTCL 应用于 VG150、Open Images V6 和 GQA200 数据集，证明了它的有效性。结果表明，HTCL 以最小的召回率牺牲获得了更高的平均召回率，总体性能达到了新的一流水平。我们的代码见 https://github.com/wanglei0618/HTCL。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.