VLF-SAR: A Novel Vision-Language Framework for Few-Shot SAR Target Recognition

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-04-10 DOI:10.1109/TCSVT.2025.3558801

Nishang Xie;Tao Zhang;Lanyu Zhang;Jie Chen;Feiming Wei;Wenxian Yu

{"title":"VLF-SAR: A Novel Vision-Language Framework for Few-Shot SAR Target Recognition","authors":"Nishang Xie;Tao Zhang;Lanyu Zhang;Jie Chen;Feiming Wei;Wenxian Yu","doi":"10.1109/TCSVT.2025.3558801","DOIUrl":null,"url":null,"abstract":"Due to the challenges of obtaining data from valuable targets, few-shot learning plays a critical role in synthetic aperture radar (SAR) target recognition. However, the high noise levels and complex backgrounds inherent in SAR data make this technology difficult to implement. To improve the recognition accuracy, in this paper, we propose a novel vision-language framework, VLF-SAR, with two specialized models: VLF-SAR-P for polarimetric SAR (PolSAR) data and VLF-SAR-T for traditional SAR data. Both models start with a frequency embedded module (FEM) to generate key structural features. For VLF-SAR-P, a polarimetric feature selector (PFS) is further introduced to identify the most relevant polarimetric features. Also, a novel adaptive multimodal triple attention mechanism (AMTAM) is designed to facilitate dynamic interactions between different kinds of features. For VLF-SAR-T, after FEM, a multimodal fusion attention mechanism (MFAM) is correspondingly proposed to fuse and adapt information extracted from frozen contrastive language-image pre-training (CLIP) encoders across different modalities. Extensive experiments on the OpenSARShip2.0, FUSAR-Ship, and SAR-AirCraft-1.0 datasets demonstrate the superiority of VLF-SAR over some state-of-the-art methods, offering a promising approach for few-shot SAR target recognition.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9530-9544"},"PeriodicalIF":11.1000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10960691/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Due to the challenges of obtaining data from valuable targets, few-shot learning plays a critical role in synthetic aperture radar (SAR) target recognition. However, the high noise levels and complex backgrounds inherent in SAR data make this technology difficult to implement. To improve the recognition accuracy, in this paper, we propose a novel vision-language framework, VLF-SAR, with two specialized models: VLF-SAR-P for polarimetric SAR (PolSAR) data and VLF-SAR-T for traditional SAR data. Both models start with a frequency embedded module (FEM) to generate key structural features. For VLF-SAR-P, a polarimetric feature selector (PFS) is further introduced to identify the most relevant polarimetric features. Also, a novel adaptive multimodal triple attention mechanism (AMTAM) is designed to facilitate dynamic interactions between different kinds of features. For VLF-SAR-T, after FEM, a multimodal fusion attention mechanism (MFAM) is correspondingly proposed to fuse and adapt information extracted from frozen contrastive language-image pre-training (CLIP) encoders across different modalities. Extensive experiments on the OpenSARShip2.0, FUSAR-Ship, and SAR-AirCraft-1.0 datasets demonstrate the superiority of VLF-SAR over some state-of-the-art methods, offering a promising approach for few-shot SAR target recognition.

查看原文本刊更多论文

VLF-SAR：一种新的多镜头SAR目标识别视觉语言框架

由于难以从有价值的目标中获取数据，少射学习在合成孔径雷达（SAR）目标识别中起着至关重要的作用。然而，SAR数据固有的高噪声水平和复杂背景使得该技术难以实现。为了提高识别精度，本文提出了一种新的视觉语言框架——VLF-SAR，其中包括两个专用模型：用于偏振SAR （PolSAR）数据的VLF-SAR- p模型和用于传统SAR数据的VLF-SAR- t模型。两种模型都从频率嵌入模块（FEM）开始，以生成关键结构特征。对于VLF-SAR-P，进一步引入偏振特征选择器（PFS）来识别最相关的偏振特征。此外，设计了一种新的自适应多模态三注意机制（AMTAM），以促进不同类型特征之间的动态交互。对于VLF-SAR-T，在FEM之后，相应提出了一种多模态融合注意机制（MFAM），用于融合和适应不同模态的冷冻对比语言图像预训练（CLIP）编码器提取的信息。在OpenSARShip2.0、FUSAR-Ship和SAR- aircraft -1.0数据集上进行的大量实验表明，VLF-SAR优于一些最先进的方法，为少量SAR目标识别提供了一种有前途的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.