FII-DETR: Few-shot object detection with fully information interaction

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-09-14 DOI:10.1016/j.inffus.2025.103728

Kun Ren , Zhengzhen Li , Yongping Du , Honggui Han , Yufeng Wu

{"title":"FII-DETR: Few-shot object detection with fully information interaction","authors":"Kun Ren , Zhengzhen Li , Yongping Du , Honggui Han , Yufeng Wu","doi":"10.1016/j.inffus.2025.103728","DOIUrl":null,"url":null,"abstract":"<div><div>Few-shot object detection (FSOD) aims to effectively classify and localize objects in images with only a few annotated samples. Recent meta-learning-based DETR approaches achieve promising performance in FSOD tasks. However, classifying confusing categories remains a critical challenge, particularly in scenarios involving occluded or small objects. To tackle this problem, we propose a meta-learning FSOD model built upon Deformable DETR, focusing on full information interaction, named FII-DETR. Firstly, an Adaptive Foreground Enhancement (AFE) module is designed to adaptively enhance important information and edge-aware representations in support images, enabling the model to capture discriminative features more effectively. Secondly, a Multiscale Local Information Fusion (MLIF) module and a Global Symmetric Aggregation (GSA) module are proposed to enhance local information interaction and aggregate support and query features from local and global perspectives. In addition, we introduce self-supervised pretraining (SSP) into the meta-learning FSOD framework to further enhance FII-DETR’s generalization capability by maximizing the mutual information of prior knowledge. We comprehensively evaluate the performance of FII-DETR on PASCAL VOC and MS COCO benchmarks. FII-DETR outperforms state-of-the-art FM-FSOD by 3 %, Meta-DeDETR by 2.6 %, and Meta-DETR by 6.5 %, averaging three splits on the PASCAL VOC. On the COCO dataset, FII-DETR outperforms Meta-DETR and Meta-DeDETR and is also superior to FM-FSOD in the 1-shot and 3-shot settings. This work demonstrates that fully information interaction and aggregation can provide effective and robust support for improving the performance of FSOD built upon DETR.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103728"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525007900","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Few-shot object detection (FSOD) aims to effectively classify and localize objects in images with only a few annotated samples. Recent meta-learning-based DETR approaches achieve promising performance in FSOD tasks. However, classifying confusing categories remains a critical challenge, particularly in scenarios involving occluded or small objects. To tackle this problem, we propose a meta-learning FSOD model built upon Deformable DETR, focusing on full information interaction, named FII-DETR. Firstly, an Adaptive Foreground Enhancement (AFE) module is designed to adaptively enhance important information and edge-aware representations in support images, enabling the model to capture discriminative features more effectively. Secondly, a Multiscale Local Information Fusion (MLIF) module and a Global Symmetric Aggregation (GSA) module are proposed to enhance local information interaction and aggregate support and query features from local and global perspectives. In addition, we introduce self-supervised pretraining (SSP) into the meta-learning FSOD framework to further enhance FII-DETR’s generalization capability by maximizing the mutual information of prior knowledge. We comprehensively evaluate the performance of FII-DETR on PASCAL VOC and MS COCO benchmarks. FII-DETR outperforms state-of-the-art FM-FSOD by 3 %, Meta-DeDETR by 2.6 %, and Meta-DETR by 6.5 %, averaging three splits on the PASCAL VOC. On the COCO dataset, FII-DETR outperforms Meta-DETR and Meta-DeDETR and is also superior to FM-FSOD in the 1-shot and 3-shot settings. This work demonstrates that fully information interaction and aggregation can provide effective and robust support for improving the performance of FSOD built upon DETR.

查看原文本刊更多论文

FII-DETR：具有完全信息交互的少镜头目标检测

少量目标检测（few -shot object detection， FSOD）的目的是在只有少量注释样本的情况下，对图像中的目标进行有效的分类和定位。最近基于元学习的DETR方法在FSOD任务中取得了很好的性能。然而，对混淆的类别进行分类仍然是一个关键的挑战，特别是在涉及遮挡或小物体的情况下。为了解决这个问题，我们提出了一个基于Deformable DETR的元学习FSOD模型，专注于全信息交互，命名为FII-DETR。首先，设计了自适应前景增强（AFE）模块，自适应增强支持图像中的重要信息和边缘感知表示，使模型能够更有效地捕获判别特征；其次，提出了多尺度局部信息融合（MLIF）模块和全局对称聚合（GSA）模块，增强了局部信息交互，从局部和全局角度聚合了支持和查询特征；此外，我们将自监督预训练（SSP）引入元学习FSOD框架，通过最大化先验知识的互信息，进一步增强FII-DETR的泛化能力。我们在PASCAL VOC和MS COCO基准上全面评估FII-DETR的性能。FII-DETR比最先进的FM-FSOD高出3%，Meta-DETR高出2.6%，Meta-DETR高出6.5%，与PASCAL VOC平均相差三倍。在COCO数据集上，FII-DETR优于Meta-DETR和Meta-DeDETR，并且在1次和3次设置中也优于FM-FSOD。这项工作表明，充分的信息交互和聚合可以为提高基于DETR的FSOD的性能提供有效和强大的支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.