{"title":"End-to-end transformer-based detection with density-guided query selection for small objects","authors":"Nguyen Hoanh , Tran Vu Pham","doi":"10.1016/j.neucom.2025.131554","DOIUrl":null,"url":null,"abstract":"<div><div>Small object detection remains a persistent challenge in transformer-based detectors due to their limited localization precision and reliance on fixed query mechanisms. In this paper, we propose Hybrid Density-Transformer (HyDeTr), a novel transformer-based object detection framework designed to improve the detection of small and densely packed objects with only a slight trade-off in inference complexity. HyDeTr introduces several key innovations: (1) a Context-Selective Hybrid Attention Encoder (CS-HAE) that distills global context from low-resolution features through efficient kernelized attention while preserving local detail via deformable attention on higher-resolution maps; (2) a Density Map Prediction module that generates a spatial prior highlighting high-object-density regions, facilitating focus on crowded scenes; (3) a Density-Guided Uncertainty-Minimal Query Selection strategy that identifies the most informative query locations based on both classification confidence and predicted density, ensuring that even low-confidence small objects in dense areas are effectively queried; and (4) an improved Query Formulation with dual embeddings, consisting of a content embedding and a 4D anchor box, refined iteratively by the decoder. Our design enables precise, density-aware query initialization and scale adaptation, leading to improved recall and accuracy for small objects. Extensive evaluations demonstrate that HyDeTr outperforms existing methods in detecting small objects, offering significant accuracy gains with only a modest increase in inference complexity, thereby maintaining near real-time performance and full end-to-end trainability.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"656 ","pages":"Article 131554"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092523122502226X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Small object detection remains a persistent challenge in transformer-based detectors due to their limited localization precision and reliance on fixed query mechanisms. In this paper, we propose Hybrid Density-Transformer (HyDeTr), a novel transformer-based object detection framework designed to improve the detection of small and densely packed objects with only a slight trade-off in inference complexity. HyDeTr introduces several key innovations: (1) a Context-Selective Hybrid Attention Encoder (CS-HAE) that distills global context from low-resolution features through efficient kernelized attention while preserving local detail via deformable attention on higher-resolution maps; (2) a Density Map Prediction module that generates a spatial prior highlighting high-object-density regions, facilitating focus on crowded scenes; (3) a Density-Guided Uncertainty-Minimal Query Selection strategy that identifies the most informative query locations based on both classification confidence and predicted density, ensuring that even low-confidence small objects in dense areas are effectively queried; and (4) an improved Query Formulation with dual embeddings, consisting of a content embedding and a 4D anchor box, refined iteratively by the decoder. Our design enables precise, density-aware query initialization and scale adaptation, leading to improved recall and accuracy for small objects. Extensive evaluations demonstrate that HyDeTr outperforms existing methods in detecting small objects, offering significant accuracy gains with only a modest increase in inference complexity, thereby maintaining near real-time performance and full end-to-end trainability.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.