End-to-end transformer-based detection with density-guided query selection for small objects

IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Nguyen Hoanh , Tran Vu Pham
{"title":"End-to-end transformer-based detection with density-guided query selection for small objects","authors":"Nguyen Hoanh ,&nbsp;Tran Vu Pham","doi":"10.1016/j.neucom.2025.131554","DOIUrl":null,"url":null,"abstract":"<div><div>Small object detection remains a persistent challenge in transformer-based detectors due to their limited localization precision and reliance on fixed query mechanisms. In this paper, we propose Hybrid Density-Transformer (HyDeTr), a novel transformer-based object detection framework designed to improve the detection of small and densely packed objects with only a slight trade-off in inference complexity. HyDeTr introduces several key innovations: (1) a Context-Selective Hybrid Attention Encoder (CS-HAE) that distills global context from low-resolution features through efficient kernelized attention while preserving local detail via deformable attention on higher-resolution maps; (2) a Density Map Prediction module that generates a spatial prior highlighting high-object-density regions, facilitating focus on crowded scenes; (3) a Density-Guided Uncertainty-Minimal Query Selection strategy that identifies the most informative query locations based on both classification confidence and predicted density, ensuring that even low-confidence small objects in dense areas are effectively queried; and (4) an improved Query Formulation with dual embeddings, consisting of a content embedding and a 4D anchor box, refined iteratively by the decoder. Our design enables precise, density-aware query initialization and scale adaptation, leading to improved recall and accuracy for small objects. Extensive evaluations demonstrate that HyDeTr outperforms existing methods in detecting small objects, offering significant accuracy gains with only a modest increase in inference complexity, thereby maintaining near real-time performance and full end-to-end trainability.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"656 ","pages":"Article 131554"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092523122502226X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Small object detection remains a persistent challenge in transformer-based detectors due to their limited localization precision and reliance on fixed query mechanisms. In this paper, we propose Hybrid Density-Transformer (HyDeTr), a novel transformer-based object detection framework designed to improve the detection of small and densely packed objects with only a slight trade-off in inference complexity. HyDeTr introduces several key innovations: (1) a Context-Selective Hybrid Attention Encoder (CS-HAE) that distills global context from low-resolution features through efficient kernelized attention while preserving local detail via deformable attention on higher-resolution maps; (2) a Density Map Prediction module that generates a spatial prior highlighting high-object-density regions, facilitating focus on crowded scenes; (3) a Density-Guided Uncertainty-Minimal Query Selection strategy that identifies the most informative query locations based on both classification confidence and predicted density, ensuring that even low-confidence small objects in dense areas are effectively queried; and (4) an improved Query Formulation with dual embeddings, consisting of a content embedding and a 4D anchor box, refined iteratively by the decoder. Our design enables precise, density-aware query initialization and scale adaptation, leading to improved recall and accuracy for small objects. Extensive evaluations demonstrate that HyDeTr outperforms existing methods in detecting small objects, offering significant accuracy gains with only a modest increase in inference complexity, thereby maintaining near real-time performance and full end-to-end trainability.
基于端到端变压器的小对象密度导向查询选择检测
由于变压器检测器的定位精度有限且依赖于固定的查询机制,小目标检测一直是变压器检测器面临的挑战。在本文中,我们提出了混合密度变压器(HyDeTr),这是一种新颖的基于变压器的目标检测框架,旨在提高对小而密集的目标的检测,同时仅在推理复杂性方面略有折衷。HyDeTr引入了几个关键创新:(1)上下文选择性混合注意编码器(CS-HAE),通过高效的核化注意从低分辨率特征中提取全局上下文,同时通过在高分辨率地图上的可变形注意保留局部细节;(2)密度地图预测模块,生成高目标密度区域的空间先验,便于对拥挤场景的关注;(3)密度导向的最小不确定性查询选择策略,该策略基于分类置信度和预测密度识别最具信息量的查询位置,确保在密集区域中即使是低置信度的小对象也能被有效查询;(4)由解码器迭代改进的包含内容嵌入和4D锚盒的双嵌入的改进查询公式。我们的设计实现了精确的、密度感知的查询初始化和规模适应,从而提高了小对象的召回率和准确性。广泛的评估表明,HyDeTr在检测小物体方面优于现有的方法,在仅适度增加推理复杂性的情况下提供了显着的精度提高,从而保持了接近实时的性能和完全的端到端可训练性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信