更多的信号对检测很重要：整合语言知识和频率表示以提高细粒度飞机识别

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-03-21 DOI:10.1016/j.neunet.2025.107402

Xueru Xu, Zhong Chen, Yuxin Hu, Guoyou Wang

{"title":"更多的信号对检测很重要：整合语言知识和频率表示以提高细粒度飞机识别","authors":"Xueru Xu, Zhong Chen, Yuxin Hu, Guoyou Wang","doi":"10.1016/j.neunet.2025.107402","DOIUrl":null,"url":null,"abstract":"<div><div>As object detection tasks progress rapidly, fine-grained detection flourishes as a promising extension. Fine-grained recognition naturally demands high-quality detail signals; however, existing fine-grained detectors, built upon the mainstream detection paradigm, struggle to simultaneously address the challenges of insufficient original signals and the loss of critical signals, resulting in inferior performance. We argue that language signals with advanced semantic knowledge can provide valuable information for fine-grained objects, as well as the frequency domain exhibits greater flexibility in suppressing and enhancing signals; then, we propose a fine-grained aircraft detector by integrating language knowledge and frequency representations into the one-stage detection paradigm. Concretely, by considering both original signals and deep feature signals, we develop three components, including an adaptive frequency augmentation branch (AFAB), a content-aware global features intensifier (CGFI), and a fine-grained text–image interactive feeder (FTIF), to facilitate perceiving and retaining critical signals throughout pivotal detection stages. The AFAB adaptively processes image patches according to their frequency characteristics in the Fourier domain, thus thoroughly mining critical visual content in the data space; the CGFI employs content-aware frequency filtering to enhance global features, allowing for generating an information-rich feature space; the FTIF introduces text knowledge to describe visual differences among fine-grained categories, conveying robust semantic priors from language signals to visual spaces via multimodal interaction for information supplement. Extensive experiments conducted on optical and SAR images demonstrate the superior performance of the proposed fine-grained detector, especially the FTIF, which can be plugged into most existing one-stage detectors to boost their fine-grained recognition performance significantly.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107402"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"More signals matter to detection: Integrating language knowledge and frequency representations for boosting fine-grained aircraft recognition\",\"authors\":\"Xueru Xu, Zhong Chen, Yuxin Hu, Guoyou Wang\",\"doi\":\"10.1016/j.neunet.2025.107402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As object detection tasks progress rapidly, fine-grained detection flourishes as a promising extension. Fine-grained recognition naturally demands high-quality detail signals; however, existing fine-grained detectors, built upon the mainstream detection paradigm, struggle to simultaneously address the challenges of insufficient original signals and the loss of critical signals, resulting in inferior performance. We argue that language signals with advanced semantic knowledge can provide valuable information for fine-grained objects, as well as the frequency domain exhibits greater flexibility in suppressing and enhancing signals; then, we propose a fine-grained aircraft detector by integrating language knowledge and frequency representations into the one-stage detection paradigm. Concretely, by considering both original signals and deep feature signals, we develop three components, including an adaptive frequency augmentation branch (AFAB), a content-aware global features intensifier (CGFI), and a fine-grained text–image interactive feeder (FTIF), to facilitate perceiving and retaining critical signals throughout pivotal detection stages. The AFAB adaptively processes image patches according to their frequency characteristics in the Fourier domain, thus thoroughly mining critical visual content in the data space; the CGFI employs content-aware frequency filtering to enhance global features, allowing for generating an information-rich feature space; the FTIF introduces text knowledge to describe visual differences among fine-grained categories, conveying robust semantic priors from language signals to visual spaces via multimodal interaction for information supplement. Extensive experiments conducted on optical and SAR images demonstrate the superior performance of the proposed fine-grained detector, especially the FTIF, which can be plugged into most existing one-stage detectors to boost their fine-grained recognition performance significantly.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"187 \",\"pages\":\"Article 107402\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025002813\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025002813","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

随着目标检测任务的快速发展，细粒度检测作为一种有前途的扩展而蓬勃发展。细粒度识别自然需要高质量的细节信号；然而，现有的细粒度检测器建立在主流检测范式的基础上，难以同时解决原始信号不足和关键信号丢失的挑战，导致性能较差。我们认为具有高级语义知识的语言信号可以为细粒度对象提供有价值的信息，并且频域在抑制和增强信号方面表现出更大的灵活性；然后，我们通过将语言知识和频率表示集成到一阶段检测范式中，提出了一种细粒度的飞机检测器。具体而言，通过考虑原始信号和深度特征信号，我们开发了三个组件，包括自适应频率增强分支（AFAB），内容感知全局特征增强器（CGFI）和细粒度文本图像交互馈线（FTIF），以促进在关键检测阶段感知和保留关键信号。AFAB根据图像块在傅里叶域中的频率特征自适应处理图像块，从而彻底挖掘数据空间中的关键视觉内容；CGFI采用内容感知频率滤波增强全局特征，生成信息丰富的特征空间；FTIF引入文本知识来描述细粒度类别之间的视觉差异，通过多模态交互向视觉空间传递从语言信号到视觉空间的鲁棒语义先验，进行信息补充。在光学和SAR图像上进行的大量实验证明了所提出的细粒度检测器的优越性能，特别是FTIF，可以插入到大多数现有的一级检测器中，以显着提高其细粒度识别性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

More signals matter to detection: Integrating language knowledge and frequency representations for boosting fine-grained aircraft recognition

As object detection tasks progress rapidly, fine-grained detection flourishes as a promising extension. Fine-grained recognition naturally demands high-quality detail signals; however, existing fine-grained detectors, built upon the mainstream detection paradigm, struggle to simultaneously address the challenges of insufficient original signals and the loss of critical signals, resulting in inferior performance. We argue that language signals with advanced semantic knowledge can provide valuable information for fine-grained objects, as well as the frequency domain exhibits greater flexibility in suppressing and enhancing signals; then, we propose a fine-grained aircraft detector by integrating language knowledge and frequency representations into the one-stage detection paradigm. Concretely, by considering both original signals and deep feature signals, we develop three components, including an adaptive frequency augmentation branch (AFAB), a content-aware global features intensifier (CGFI), and a fine-grained text–image interactive feeder (FTIF), to facilitate perceiving and retaining critical signals throughout pivotal detection stages. The AFAB adaptively processes image patches according to their frequency characteristics in the Fourier domain, thus thoroughly mining critical visual content in the data space; the CGFI employs content-aware frequency filtering to enhance global features, allowing for generating an information-rich feature space; the FTIF introduces text knowledge to describe visual differences among fine-grained categories, conveying robust semantic priors from language signals to visual spaces via multimodal interaction for information supplement. Extensive experiments conducted on optical and SAR images demonstrate the superior performance of the proposed fine-grained detector, especially the FTIF, which can be plugged into most existing one-stage detectors to boost their fine-grained recognition performance significantly.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.