More signals matter to detection: Integrating language knowledge and frequency representations for boosting fine-grained aircraft recognition

IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xueru Xu, Zhong Chen, Yuxin Hu, Guoyou Wang
{"title":"More signals matter to detection: Integrating language knowledge and frequency representations for boosting fine-grained aircraft recognition","authors":"Xueru Xu,&nbsp;Zhong Chen,&nbsp;Yuxin Hu,&nbsp;Guoyou Wang","doi":"10.1016/j.neunet.2025.107402","DOIUrl":null,"url":null,"abstract":"<div><div>As object detection tasks progress rapidly, fine-grained detection flourishes as a promising extension. Fine-grained recognition naturally demands high-quality detail signals; however, existing fine-grained detectors, built upon the mainstream detection paradigm, struggle to simultaneously address the challenges of insufficient original signals and the loss of critical signals, resulting in inferior performance. We argue that language signals with advanced semantic knowledge can provide valuable information for fine-grained objects, as well as the frequency domain exhibits greater flexibility in suppressing and enhancing signals; then, we propose a fine-grained aircraft detector by integrating language knowledge and frequency representations into the one-stage detection paradigm. Concretely, by considering both original signals and deep feature signals, we develop three components, including an adaptive frequency augmentation branch (AFAB), a content-aware global features intensifier (CGFI), and a fine-grained text–image interactive feeder (FTIF), to facilitate perceiving and retaining critical signals throughout pivotal detection stages. The AFAB adaptively processes image patches according to their frequency characteristics in the Fourier domain, thus thoroughly mining critical visual content in the data space; the CGFI employs content-aware frequency filtering to enhance global features, allowing for generating an information-rich feature space; the FTIF introduces text knowledge to describe visual differences among fine-grained categories, conveying robust semantic priors from language signals to visual spaces via multimodal interaction for information supplement. Extensive experiments conducted on optical and SAR images demonstrate the superior performance of the proposed fine-grained detector, especially the FTIF, which can be plugged into most existing one-stage detectors to boost their fine-grained recognition performance significantly.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107402"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025002813","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

As object detection tasks progress rapidly, fine-grained detection flourishes as a promising extension. Fine-grained recognition naturally demands high-quality detail signals; however, existing fine-grained detectors, built upon the mainstream detection paradigm, struggle to simultaneously address the challenges of insufficient original signals and the loss of critical signals, resulting in inferior performance. We argue that language signals with advanced semantic knowledge can provide valuable information for fine-grained objects, as well as the frequency domain exhibits greater flexibility in suppressing and enhancing signals; then, we propose a fine-grained aircraft detector by integrating language knowledge and frequency representations into the one-stage detection paradigm. Concretely, by considering both original signals and deep feature signals, we develop three components, including an adaptive frequency augmentation branch (AFAB), a content-aware global features intensifier (CGFI), and a fine-grained text–image interactive feeder (FTIF), to facilitate perceiving and retaining critical signals throughout pivotal detection stages. The AFAB adaptively processes image patches according to their frequency characteristics in the Fourier domain, thus thoroughly mining critical visual content in the data space; the CGFI employs content-aware frequency filtering to enhance global features, allowing for generating an information-rich feature space; the FTIF introduces text knowledge to describe visual differences among fine-grained categories, conveying robust semantic priors from language signals to visual spaces via multimodal interaction for information supplement. Extensive experiments conducted on optical and SAR images demonstrate the superior performance of the proposed fine-grained detector, especially the FTIF, which can be plugged into most existing one-stage detectors to boost their fine-grained recognition performance significantly.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信