Improved query specialization for transformer-based visual relationship detection

IF 6.8 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-09-10 DOI:10.1016/j.ins.2025.122668

Jongha Kim , Jihwan Park , Jinyoung Park , Jinyoung Kim , Sehyung Kim , Hyunwoo J. Kim

{"title":"Improved query specialization for transformer-based visual relationship detection","authors":"Jongha Kim , Jihwan Park , Jinyoung Park , Jinyoung Kim , Sehyung Kim , Hyunwoo J. Kim","doi":"10.1016/j.ins.2025.122668","DOIUrl":null,"url":null,"abstract":"<div><div>Visual Relationship Detection (VRD) has significantly advanced with Transformer-based architectures. However, we identify two fundamental drawbacks in conventional label assignment methods used for training Transformer-based VRD models, where ground-truth (GT) annotations are matched to model predictions. In conventional assignment, queries are trained to detect all relations rather than specializing in specific ones, resulting in ‘unspecialized’ queries. Also, each ground-truth (GT) annotation is assigned to only one prediction under conventional assignment, suppressing other near-correct predictions by labeling them as ‘no relation’. To address these issues, we introduce a novel method called Groupwise Query <strong>Spe</strong>ci<strong>a</strong>lization and <strong>Q</strong>uality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization clusters queries and relations into exclusive groups, promoting specialization by assigning a set of relations only to a corresponding query group. Quality-Aware Multi-Assignment enhances training signals by allowing multiple predictions closely matching the GT to be positively assigned. Additionally, we introduce dynamic query reallocation, which transfers queries from high- to low-performing groups for balanced training. Experimental results demonstrate that SpeaQ+, combining SpeaQ with dynamic query reallocation, consistently improves performance across seven baseline models on five benchmarks without additional inference cost.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"723 ","pages":"Article 122668"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525008011","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Visual Relationship Detection (VRD) has significantly advanced with Transformer-based architectures. However, we identify two fundamental drawbacks in conventional label assignment methods used for training Transformer-based VRD models, where ground-truth (GT) annotations are matched to model predictions. In conventional assignment, queries are trained to detect all relations rather than specializing in specific ones, resulting in ‘unspecialized’ queries. Also, each ground-truth (GT) annotation is assigned to only one prediction under conventional assignment, suppressing other near-correct predictions by labeling them as ‘no relation’. To address these issues, we introduce a novel method called Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization clusters queries and relations into exclusive groups, promoting specialization by assigning a set of relations only to a corresponding query group. Quality-Aware Multi-Assignment enhances training signals by allowing multiple predictions closely matching the GT to be positively assigned. Additionally, we introduce dynamic query reallocation, which transfers queries from high- to low-performing groups for balanced training. Experimental results demonstrate that SpeaQ+, combining SpeaQ with dynamic query reallocation, consistently improves performance across seven baseline models on five benchmarks without additional inference cost.

查看原文本刊更多论文

改进了基于转换器的可视化关系检测的查询专门化

视觉关系检测（VRD）在基于变压器的体系结构中取得了显著的进步。然而，我们发现了用于训练基于transformer的VRD模型的传统标签分配方法的两个基本缺点，其中ground-truth （GT）注释与模型预测相匹配。在传统的分配中，查询被训练来检测所有关系，而不是专门针对特定的关系，从而导致“非专门化”查询。此外，在常规分配下，每个基础真值（GT）注释只分配给一个预测，通过将其他接近正确的预测标记为“无关系”来抑制它们。为了解决这些问题，我们引入了一种新的方法，称为分组查询专门化和质量感知多分配（SpeaQ）。分组查询专门化将查询和关系聚集到排他的组中，通过将一组关系仅分配给相应的查询组来促进专门化。通过允许多个与GT密切匹配的预测被正分配，质量感知多分配增强了训练信号。此外，我们引入了动态查询重新分配，它将查询从高性能组转移到低性能组以实现平衡训练。实验结果表明，将SpeaQ与动态查询重新分配相结合的SpeaQ+在没有额外推理成本的情况下，在五个基准测试中持续提高了七个基线模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.