End-to-end transformer-based detection with density-guided query selection for small objects

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-09-16 DOI:10.1016/j.neucom.2025.131554

Nguyen Hoanh , Tran Vu Pham

{"title":"End-to-end transformer-based detection with density-guided query selection for small objects","authors":"Nguyen Hoanh , Tran Vu Pham","doi":"10.1016/j.neucom.2025.131554","DOIUrl":null,"url":null,"abstract":"<div><div>Small object detection remains a persistent challenge in transformer-based detectors due to their limited localization precision and reliance on fixed query mechanisms. In this paper, we propose Hybrid Density-Transformer (HyDeTr), a novel transformer-based object detection framework designed to improve the detection of small and densely packed objects with only a slight trade-off in inference complexity. HyDeTr introduces several key innovations: (1) a Context-Selective Hybrid Attention Encoder (CS-HAE) that distills global context from low-resolution features through efficient kernelized attention while preserving local detail via deformable attention on higher-resolution maps; (2) a Density Map Prediction module that generates a spatial prior highlighting high-object-density regions, facilitating focus on crowded scenes; (3) a Density-Guided Uncertainty-Minimal Query Selection strategy that identifies the most informative query locations based on both classification confidence and predicted density, ensuring that even low-confidence small objects in dense areas are effectively queried; and (4) an improved Query Formulation with dual embeddings, consisting of a content embedding and a 4D anchor box, refined iteratively by the decoder. Our design enables precise, density-aware query initialization and scale adaptation, leading to improved recall and accuracy for small objects. Extensive evaluations demonstrate that HyDeTr outperforms existing methods in detecting small objects, offering significant accuracy gains with only a modest increase in inference complexity, thereby maintaining near real-time performance and full end-to-end trainability.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"656 ","pages":"Article 131554"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092523122502226X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Small object detection remains a persistent challenge in transformer-based detectors due to their limited localization precision and reliance on fixed query mechanisms. In this paper, we propose Hybrid Density-Transformer (HyDeTr), a novel transformer-based object detection framework designed to improve the detection of small and densely packed objects with only a slight trade-off in inference complexity. HyDeTr introduces several key innovations: (1) a Context-Selective Hybrid Attention Encoder (CS-HAE) that distills global context from low-resolution features through efficient kernelized attention while preserving local detail via deformable attention on higher-resolution maps; (2) a Density Map Prediction module that generates a spatial prior highlighting high-object-density regions, facilitating focus on crowded scenes; (3) a Density-Guided Uncertainty-Minimal Query Selection strategy that identifies the most informative query locations based on both classification confidence and predicted density, ensuring that even low-confidence small objects in dense areas are effectively queried; and (4) an improved Query Formulation with dual embeddings, consisting of a content embedding and a 4D anchor box, refined iteratively by the decoder. Our design enables precise, density-aware query initialization and scale adaptation, leading to improved recall and accuracy for small objects. Extensive evaluations demonstrate that HyDeTr outperforms existing methods in detecting small objects, offering significant accuracy gains with only a modest increase in inference complexity, thereby maintaining near real-time performance and full end-to-end trainability.

查看原文本刊更多论文

基于端到端变压器的小对象密度导向查询选择检测

由于变压器检测器的定位精度有限且依赖于固定的查询机制，小目标检测一直是变压器检测器面临的挑战。在本文中，我们提出了混合密度变压器（HyDeTr），这是一种新颖的基于变压器的目标检测框架，旨在提高对小而密集的目标的检测，同时仅在推理复杂性方面略有折衷。HyDeTr引入了几个关键创新：(1)上下文选择性混合注意编码器（CS-HAE），通过高效的核化注意从低分辨率特征中提取全局上下文，同时通过在高分辨率地图上的可变形注意保留局部细节；(2)密度地图预测模块，生成高目标密度区域的空间先验，便于对拥挤场景的关注；(3)密度导向的最小不确定性查询选择策略，该策略基于分类置信度和预测密度识别最具信息量的查询位置，确保在密集区域中即使是低置信度的小对象也能被有效查询；(4)由解码器迭代改进的包含内容嵌入和4D锚盒的双嵌入的改进查询公式。我们的设计实现了精确的、密度感知的查询初始化和规模适应，从而提高了小对象的召回率和准确性。广泛的评估表明，HyDeTr在检测小物体方面优于现有的方法，在仅适度增加推理复杂性的情况下提供了显着的精度提高，从而保持了接近实时的性能和完全的端到端可训练性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.