Kun Ren , Zhengzhen Li , Yongping Du , Honggui Han , Yufeng Wu
{"title":"FII-DETR: Few-shot object detection with fully information interaction","authors":"Kun Ren , Zhengzhen Li , Yongping Du , Honggui Han , Yufeng Wu","doi":"10.1016/j.inffus.2025.103728","DOIUrl":null,"url":null,"abstract":"<div><div>Few-shot object detection (FSOD) aims to effectively classify and localize objects in images with only a few annotated samples. Recent meta-learning-based DETR approaches achieve promising performance in FSOD tasks. However, classifying confusing categories remains a critical challenge, particularly in scenarios involving occluded or small objects. To tackle this problem, we propose a meta-learning FSOD model built upon Deformable DETR, focusing on full information interaction, named FII-DETR. Firstly, an Adaptive Foreground Enhancement (AFE) module is designed to adaptively enhance important information and edge-aware representations in support images, enabling the model to capture discriminative features more effectively. Secondly, a Multiscale Local Information Fusion (MLIF) module and a Global Symmetric Aggregation (GSA) module are proposed to enhance local information interaction and aggregate support and query features from local and global perspectives. In addition, we introduce self-supervised pretraining (SSP) into the meta-learning FSOD framework to further enhance FII-DETR’s generalization capability by maximizing the mutual information of prior knowledge. We comprehensively evaluate the performance of FII-DETR on PASCAL VOC and MS COCO benchmarks. FII-DETR outperforms state-of-the-art FM-FSOD by 3 %, Meta-DeDETR by 2.6 %, and Meta-DETR by 6.5 %, averaging three splits on the PASCAL VOC. On the COCO dataset, FII-DETR outperforms Meta-DETR and Meta-DeDETR and is also superior to FM-FSOD in the 1-shot and 3-shot settings. This work demonstrates that fully information interaction and aggregation can provide effective and robust support for improving the performance of FSOD built upon DETR.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103728"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525007900","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Few-shot object detection (FSOD) aims to effectively classify and localize objects in images with only a few annotated samples. Recent meta-learning-based DETR approaches achieve promising performance in FSOD tasks. However, classifying confusing categories remains a critical challenge, particularly in scenarios involving occluded or small objects. To tackle this problem, we propose a meta-learning FSOD model built upon Deformable DETR, focusing on full information interaction, named FII-DETR. Firstly, an Adaptive Foreground Enhancement (AFE) module is designed to adaptively enhance important information and edge-aware representations in support images, enabling the model to capture discriminative features more effectively. Secondly, a Multiscale Local Information Fusion (MLIF) module and a Global Symmetric Aggregation (GSA) module are proposed to enhance local information interaction and aggregate support and query features from local and global perspectives. In addition, we introduce self-supervised pretraining (SSP) into the meta-learning FSOD framework to further enhance FII-DETR’s generalization capability by maximizing the mutual information of prior knowledge. We comprehensively evaluate the performance of FII-DETR on PASCAL VOC and MS COCO benchmarks. FII-DETR outperforms state-of-the-art FM-FSOD by 3 %, Meta-DeDETR by 2.6 %, and Meta-DETR by 6.5 %, averaging three splits on the PASCAL VOC. On the COCO dataset, FII-DETR outperforms Meta-DETR and Meta-DeDETR and is also superior to FM-FSOD in the 1-shot and 3-shot settings. This work demonstrates that fully information interaction and aggregation can provide effective and robust support for improving the performance of FSOD built upon DETR.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.