More signals matter to detection: Integrating language knowledge and frequency representations for boosting fine-grained aircraft recognition

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-03-21 DOI:10.1016/j.neunet.2025.107402

Xueru Xu, Zhong Chen, Yuxin Hu, Guoyou Wang

{"title":"More signals matter to detection: Integrating language knowledge and frequency representations for boosting fine-grained aircraft recognition","authors":"Xueru Xu, Zhong Chen, Yuxin Hu, Guoyou Wang","doi":"10.1016/j.neunet.2025.107402","DOIUrl":null,"url":null,"abstract":"<div><div>As object detection tasks progress rapidly, fine-grained detection flourishes as a promising extension. Fine-grained recognition naturally demands high-quality detail signals; however, existing fine-grained detectors, built upon the mainstream detection paradigm, struggle to simultaneously address the challenges of insufficient original signals and the loss of critical signals, resulting in inferior performance. We argue that language signals with advanced semantic knowledge can provide valuable information for fine-grained objects, as well as the frequency domain exhibits greater flexibility in suppressing and enhancing signals; then, we propose a fine-grained aircraft detector by integrating language knowledge and frequency representations into the one-stage detection paradigm. Concretely, by considering both original signals and deep feature signals, we develop three components, including an adaptive frequency augmentation branch (AFAB), a content-aware global features intensifier (CGFI), and a fine-grained text–image interactive feeder (FTIF), to facilitate perceiving and retaining critical signals throughout pivotal detection stages. The AFAB adaptively processes image patches according to their frequency characteristics in the Fourier domain, thus thoroughly mining critical visual content in the data space; the CGFI employs content-aware frequency filtering to enhance global features, allowing for generating an information-rich feature space; the FTIF introduces text knowledge to describe visual differences among fine-grained categories, conveying robust semantic priors from language signals to visual spaces via multimodal interaction for information supplement. Extensive experiments conducted on optical and SAR images demonstrate the superior performance of the proposed fine-grained detector, especially the FTIF, which can be plugged into most existing one-stage detectors to boost their fine-grained recognition performance significantly.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107402"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025002813","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

As object detection tasks progress rapidly, fine-grained detection flourishes as a promising extension. Fine-grained recognition naturally demands high-quality detail signals; however, existing fine-grained detectors, built upon the mainstream detection paradigm, struggle to simultaneously address the challenges of insufficient original signals and the loss of critical signals, resulting in inferior performance. We argue that language signals with advanced semantic knowledge can provide valuable information for fine-grained objects, as well as the frequency domain exhibits greater flexibility in suppressing and enhancing signals; then, we propose a fine-grained aircraft detector by integrating language knowledge and frequency representations into the one-stage detection paradigm. Concretely, by considering both original signals and deep feature signals, we develop three components, including an adaptive frequency augmentation branch (AFAB), a content-aware global features intensifier (CGFI), and a fine-grained text–image interactive feeder (FTIF), to facilitate perceiving and retaining critical signals throughout pivotal detection stages. The AFAB adaptively processes image patches according to their frequency characteristics in the Fourier domain, thus thoroughly mining critical visual content in the data space; the CGFI employs content-aware frequency filtering to enhance global features, allowing for generating an information-rich feature space; the FTIF introduces text knowledge to describe visual differences among fine-grained categories, conveying robust semantic priors from language signals to visual spaces via multimodal interaction for information supplement. Extensive experiments conducted on optical and SAR images demonstrate the superior performance of the proposed fine-grained detector, especially the FTIF, which can be plugged into most existing one-stage detectors to boost their fine-grained recognition performance significantly.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.