Military Aircraft Recognition Method Based on Attention Mechanism in Remote Sensing Images

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Image Processing Pub Date : 2025-04-09 DOI:10.1049/ipr2.70069

Kun Liu, Zhengfan Xu, Yang Liu, Guofeng Xu

{"title":"Military Aircraft Recognition Method Based on Attention Mechanism in Remote Sensing Images","authors":"Kun Liu, Zhengfan Xu, Yang Liu, Guofeng Xu","doi":"10.1049/ipr2.70069","DOIUrl":null,"url":null,"abstract":"Remote sensing images play a crucial role in fields such as reconnaissance and early warning, intelligence analysis, etc. Due to factors such as climate, season, lighting, occlusion and even atmospheric scattering during remote sensing image acquisition, targets of the same model exhibit significant intra-class variability. This article applies deep learning technology to the field of military aircraft recognition in remote sensing images and proposes a You Only Look Once Version 8 Small (YOLOv8s) remote sensing image military aircraft recognition algorithm based on an attention mechanism—YOLOv8s-TDP (YOLOv8s+TripletAttention+dysample+PIoU). First, the TripletAttention attention module is used in the neck network, which captures cross-dimensional interactions and utilises a three-branch structure to calculate attention weights. This further enhances the network's ability to preserve details and restore colours in the process of image fusion. Secondly, an efficient dynamic upsampler, dysample, is used to achieve dynamic upsampling through point sampling, which improves the problems of detail loss, jagged edges, and image distortion that may occur with nearest neighbour interpolation. Finally, replacing the original model loss function with PIoU (Pixels Intersection over Union), IoU (Intersection over Union) is calculated at the pixel level to more accurately capture small overlapping areas, reduce missed detection rates, and improve accuracy. On the publicly available dataset The Remote Sensing Image Military Aircraft Target Recognition Dataset(MAR20), our proposed YOLOv8s-TDP model achieved a <math>\n <semantics>\n <mrow>\n <mi>P</mi>\n <mi>r</mi>\n <mi>e</mi>\n <mi>c</mi>\n <mi>i</mi>\n <mi>s</mi>\n <mi>i</mi>\n <mi>o</mi>\n <mi>n</mi>\n </mrow>\n <annotation>${\\mathrm Precision} $</annotation>\n </semantics></math> of 82.96%, <math>\n <semantics>\n <mrow>\n <mi>R</mi>\n <mi>e</mi>\n <mi>c</mi>\n <mi>a</mi>\n <mi>l</mi>\n <mi>l</mi>\n </mrow>\n <annotation>${\\mathrm Recall} $</annotation>\n </semantics></math> of 80.71%, <math>\n <semantics>\n <mrow>\n <mi>m</mi>\n <mi>A</mi>\n <msub>\n <mi>P</mi>\n <mn>0.5</mn>\n </msub>\n </mrow>\n <annotation>$mA{P}_{0.5}$</annotation>\n </semantics></math> of 87.11% and <math>\n <semantics>\n <mrow>\n <mi>m</mi>\n <mi>A</mi>\n <msub>\n <mi>P</mi>\n <mrow>\n <mn>0.5</mn>\n <mo>−</mo>\n <mn>0.95</mn>\n </mrow>\n </msub>\n </mrow>\n <annotation>$mA{P}_{0.5 - 0.95}$</annotation>\n </semantics></math> of 65.88%, outperforming the original YOLOv8s model, RT-DETR model, YOLOv5 series model, YOLOv7 series model, and YOLOv11 series model. Compared with the original YOLOv8s model, the YOLOv8s-TDP model improves <math>\n <semantics>\n <mrow>\n <mi>P</mi>\n <mi>r</mi>\n <mi>e</mi>\n <mi>c</mi>\n <mi>i</mi>\n <mi>s</mi>\n <mi>i</mi>\n <mi>o</mi>\n <mi>n</mi>\n </mrow>\n <annotation>$Precision$</annotation>\n </semantics></math> by 0.23%, <math>\n <semantics>\n <mrow>\n <mi>R</mi>\n <mi>e</mi>\n <mi>c</mi>\n <mi>a</mi>\n <mi>l</mi>\n <mi>l</mi>\n </mrow>\n <annotation>$Recall$</annotation>\n </semantics></math> by 2.61%, <math>\n <semantics>\n <mrow>\n <mi>m</mi>\n <mi>A</mi>\n <msub>\n <mi>P</mi>\n <mn>0.5</mn>\n </msub>\n </mrow>\n <annotation>$mA{P}_{0.5}$</annotation>\n </semantics></math> by 2.76%, and <math>\n <semantics>\n <mrow>\n <mi>m</mi>\n <mi>A</mi>\n <msub>\n <mi>P</mi>\n <mrow>\n <mn>0.5</mn>\n <mo>−</mo>\n <mn>0.95</mn>\n </mrow>\n </msub>\n </mrow>\n <annotation>$mA{P}_{0.5 - 0.95}$</annotation>\n </semantics></math> by 2.49%, verifying that the algorithm has good performance in remote sensing image military aircraft recognition.","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70069","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70069","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Remote sensing images play a crucial role in fields such as reconnaissance and early warning, intelligence analysis, etc. Due to factors such as climate, season, lighting, occlusion and even atmospheric scattering during remote sensing image acquisition, targets of the same model exhibit significant intra-class variability. This article applies deep learning technology to the field of military aircraft recognition in remote sensing images and proposes a You Only Look Once Version 8 Small (YOLOv8s) remote sensing image military aircraft recognition algorithm based on an attention mechanism—YOLOv8s-TDP (YOLOv8s+TripletAttention+dysample+PIoU). First, the TripletAttention attention module is used in the neck network, which captures cross-dimensional interactions and utilises a three-branch structure to calculate attention weights. This further enhances the network's ability to preserve details and restore colours in the process of image fusion. Secondly, an efficient dynamic upsampler, dysample, is used to achieve dynamic upsampling through point sampling, which improves the problems of detail loss, jagged edges, and image distortion that may occur with nearest neighbour interpolation. Finally, replacing the original model loss function with PIoU (Pixels Intersection over Union), IoU (Intersection over Union) is calculated at the pixel level to more accurately capture small overlapping areas, reduce missed detection rates, and improve accuracy. On the publicly available dataset The Remote Sensing Image Military Aircraft Target Recognition Dataset(MAR20), our proposed YOLOv8s-TDP model achieved a $P r e c i s i o n$ of 82.96%, $R e c a l l$ of 80.71%, $m A P_{0.5}$ of 87.11% and $m A P_{0.5 - 0.95}$ of 65.88%, outperforming the original YOLOv8s model, RT-DETR model, YOLOv5 series model, YOLOv7 series model, and YOLOv11 series model. Compared with the original YOLOv8s model, the YOLOv8s-TDP model improves $P r e c i s i o n$ by 0.23%, $R e c a l l$ by 2.61%, $m A P_{0.5}$ by 2.76%, and $m A P_{0.5 - 0.95}$ by 2.49%, verifying that the algorithm has good performance in remote sensing image military aircraft recognition.

Abstract Image

查看原文本刊更多论文

基于遥感图像注意机制的军用飞机识别方法

遥感图像在侦察预警、情报分析等领域发挥着至关重要的作用。由于遥感影像采集过程中受到气候、季节、光照、遮挡甚至大气散射等因素的影响，同一模式下的目标具有显著的类内变异性。本文将深度学习技术应用于遥感图像军机识别领域，提出了一种基于注意机制的You Only Look Once Version 8 Small （YOLOv8s）遥感图像军机识别算法- YOLOv8s- tdp （YOLOv8s+TripletAttention+ dyssample +PIoU）。首先，在颈部网络中使用TripletAttention注意模块，该模块捕获跨维交互并利用三分支结构计算注意权重。这进一步增强了网络在图像融合过程中保留细节和还原颜色的能力。其次，采用一种高效的动态上采样器——反采样器，通过点采样实现动态上采样，改善了最近邻插值可能出现的细节丢失、边缘锯齿和图像失真等问题。最后，用PIoU （Pixels Intersection over Union）代替原有的模型损失函数，在像素级计算IoU (Intersection over Union)，更准确地捕获小的重叠区域，减少漏检率，提高准确率。在公开数据集“遥感图像军用飞机目标识别数据集”（MAR20）上，我们提出的YOLOv8s-TDP模型实现了82.96%的P / r / r （P / r / r）。Recall ${\ mathm Recall} $ = 80.71%；mA P 0.5 $mA{P}_{0.5}$为87.11%,mA P 0.5−0.95$mA{P}_{0.5 ~ 0.95}$为65.88%，优于原来的YOLOv8s型号、RT-DETR型号、YOLOv5系列型号、YOLOv7系列型号和YOLOv11系列型号。与原来的YOLOv8s模型相比，YOLOv8s- tdp模型的P - r - c精度提高了0.23%。Recall$ Recall$下降2.61%,m a P 0.5 $mA{P}_{0.5}$下降2.76%，mA P 0.5−0.95 $mA{P}_{0.5—0.95}$的误差为2.49%，验证了该算法在遥感图像军机识别中具有良好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Image Processing 工程技术-工程：电子与电气

CiteScore

5.40

自引率

8.70%

发文量

282

审稿时长

6 months

期刊介绍： The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications. Principal topics include: Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality. Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing. Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing. Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video. Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography. Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security. Current Special Issue Call for Papers: Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf