Jongha Kim , Jihwan Park , Jinyoung Park , Jinyoung Kim , Sehyung Kim , Hyunwoo J. Kim
{"title":"Improved query specialization for transformer-based visual relationship detection","authors":"Jongha Kim , Jihwan Park , Jinyoung Park , Jinyoung Kim , Sehyung Kim , Hyunwoo J. Kim","doi":"10.1016/j.ins.2025.122668","DOIUrl":null,"url":null,"abstract":"<div><div>Visual Relationship Detection (VRD) has significantly advanced with Transformer-based architectures. However, we identify two fundamental drawbacks in conventional label assignment methods used for training Transformer-based VRD models, where ground-truth (GT) annotations are matched to model predictions. In conventional assignment, queries are trained to detect all relations rather than specializing in specific ones, resulting in ‘unspecialized’ queries. Also, each ground-truth (GT) annotation is assigned to only one prediction under conventional assignment, suppressing other near-correct predictions by labeling them as ‘no relation’. To address these issues, we introduce a novel method called Groupwise Query <strong>Spe</strong>ci<strong>a</strong>lization and <strong>Q</strong>uality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization clusters queries and relations into exclusive groups, promoting specialization by assigning a set of relations only to a corresponding query group. Quality-Aware Multi-Assignment enhances training signals by allowing multiple predictions closely matching the GT to be positively assigned. Additionally, we introduce dynamic query reallocation, which transfers queries from high- to low-performing groups for balanced training. Experimental results demonstrate that SpeaQ+, combining SpeaQ with dynamic query reallocation, consistently improves performance across seven baseline models on five benchmarks without additional inference cost.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"723 ","pages":"Article 122668"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525008011","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Visual Relationship Detection (VRD) has significantly advanced with Transformer-based architectures. However, we identify two fundamental drawbacks in conventional label assignment methods used for training Transformer-based VRD models, where ground-truth (GT) annotations are matched to model predictions. In conventional assignment, queries are trained to detect all relations rather than specializing in specific ones, resulting in ‘unspecialized’ queries. Also, each ground-truth (GT) annotation is assigned to only one prediction under conventional assignment, suppressing other near-correct predictions by labeling them as ‘no relation’. To address these issues, we introduce a novel method called Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization clusters queries and relations into exclusive groups, promoting specialization by assigning a set of relations only to a corresponding query group. Quality-Aware Multi-Assignment enhances training signals by allowing multiple predictions closely matching the GT to be positively assigned. Additionally, we introduce dynamic query reallocation, which transfers queries from high- to low-performing groups for balanced training. Experimental results demonstrate that SpeaQ+, combining SpeaQ with dynamic query reallocation, consistently improves performance across seven baseline models on five benchmarks without additional inference cost.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.