改进了基于转换器的可视化关系检测的查询专门化

IF 6.8 1区 计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS
Jongha Kim , Jihwan Park , Jinyoung Park , Jinyoung Kim , Sehyung Kim , Hyunwoo J. Kim
{"title":"改进了基于转换器的可视化关系检测的查询专门化","authors":"Jongha Kim ,&nbsp;Jihwan Park ,&nbsp;Jinyoung Park ,&nbsp;Jinyoung Kim ,&nbsp;Sehyung Kim ,&nbsp;Hyunwoo J. Kim","doi":"10.1016/j.ins.2025.122668","DOIUrl":null,"url":null,"abstract":"<div><div>Visual Relationship Detection (VRD) has significantly advanced with Transformer-based architectures. However, we identify two fundamental drawbacks in conventional label assignment methods used for training Transformer-based VRD models, where ground-truth (GT) annotations are matched to model predictions. In conventional assignment, queries are trained to detect all relations rather than specializing in specific ones, resulting in ‘unspecialized’ queries. Also, each ground-truth (GT) annotation is assigned to only one prediction under conventional assignment, suppressing other near-correct predictions by labeling them as ‘no relation’. To address these issues, we introduce a novel method called Groupwise Query <strong>Spe</strong>ci<strong>a</strong>lization and <strong>Q</strong>uality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization clusters queries and relations into exclusive groups, promoting specialization by assigning a set of relations only to a corresponding query group. Quality-Aware Multi-Assignment enhances training signals by allowing multiple predictions closely matching the GT to be positively assigned. Additionally, we introduce dynamic query reallocation, which transfers queries from high- to low-performing groups for balanced training. Experimental results demonstrate that SpeaQ+, combining SpeaQ with dynamic query reallocation, consistently improves performance across seven baseline models on five benchmarks without additional inference cost.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"723 ","pages":"Article 122668"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improved query specialization for transformer-based visual relationship detection\",\"authors\":\"Jongha Kim ,&nbsp;Jihwan Park ,&nbsp;Jinyoung Park ,&nbsp;Jinyoung Kim ,&nbsp;Sehyung Kim ,&nbsp;Hyunwoo J. Kim\",\"doi\":\"10.1016/j.ins.2025.122668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Visual Relationship Detection (VRD) has significantly advanced with Transformer-based architectures. However, we identify two fundamental drawbacks in conventional label assignment methods used for training Transformer-based VRD models, where ground-truth (GT) annotations are matched to model predictions. In conventional assignment, queries are trained to detect all relations rather than specializing in specific ones, resulting in ‘unspecialized’ queries. Also, each ground-truth (GT) annotation is assigned to only one prediction under conventional assignment, suppressing other near-correct predictions by labeling them as ‘no relation’. To address these issues, we introduce a novel method called Groupwise Query <strong>Spe</strong>ci<strong>a</strong>lization and <strong>Q</strong>uality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization clusters queries and relations into exclusive groups, promoting specialization by assigning a set of relations only to a corresponding query group. Quality-Aware Multi-Assignment enhances training signals by allowing multiple predictions closely matching the GT to be positively assigned. Additionally, we introduce dynamic query reallocation, which transfers queries from high- to low-performing groups for balanced training. Experimental results demonstrate that SpeaQ+, combining SpeaQ with dynamic query reallocation, consistently improves performance across seven baseline models on five benchmarks without additional inference cost.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"723 \",\"pages\":\"Article 122668\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525008011\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525008011","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

视觉关系检测(VRD)在基于变压器的体系结构中取得了显著的进步。然而,我们发现了用于训练基于transformer的VRD模型的传统标签分配方法的两个基本缺点,其中ground-truth (GT)注释与模型预测相匹配。在传统的分配中,查询被训练来检测所有关系,而不是专门针对特定的关系,从而导致“非专门化”查询。此外,在常规分配下,每个基础真值(GT)注释只分配给一个预测,通过将其他接近正确的预测标记为“无关系”来抑制它们。为了解决这些问题,我们引入了一种新的方法,称为分组查询专门化和质量感知多分配(SpeaQ)。分组查询专门化将查询和关系聚集到排他的组中,通过将一组关系仅分配给相应的查询组来促进专门化。通过允许多个与GT密切匹配的预测被正分配,质量感知多分配增强了训练信号。此外,我们引入了动态查询重新分配,它将查询从高性能组转移到低性能组以实现平衡训练。实验结果表明,将SpeaQ与动态查询重新分配相结合的SpeaQ+在没有额外推理成本的情况下,在五个基准测试中持续提高了七个基线模型的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improved query specialization for transformer-based visual relationship detection
Visual Relationship Detection (VRD) has significantly advanced with Transformer-based architectures. However, we identify two fundamental drawbacks in conventional label assignment methods used for training Transformer-based VRD models, where ground-truth (GT) annotations are matched to model predictions. In conventional assignment, queries are trained to detect all relations rather than specializing in specific ones, resulting in ‘unspecialized’ queries. Also, each ground-truth (GT) annotation is assigned to only one prediction under conventional assignment, suppressing other near-correct predictions by labeling them as ‘no relation’. To address these issues, we introduce a novel method called Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization clusters queries and relations into exclusive groups, promoting specialization by assigning a set of relations only to a corresponding query group. Quality-Aware Multi-Assignment enhances training signals by allowing multiple predictions closely matching the GT to be positively assigned. Additionally, we introduce dynamic query reallocation, which transfers queries from high- to low-performing groups for balanced training. Experimental results demonstrate that SpeaQ+, combining SpeaQ with dynamic query reallocation, consistently improves performance across seven baseline models on five benchmarks without additional inference cost.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Sciences
Information Sciences 工程技术-计算机:信息系统
CiteScore
14.00
自引率
17.30%
发文量
1322
审稿时长
10.4 months
期刊介绍: Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信