MIR-YOLO:基于可见-红外双模态的遥感小目标检测网络

IF 2.9 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Jinli Zhong, Jianxun Zhang
{"title":"MIR-YOLO:基于可见-红外双模态的遥感小目标检测网络","authors":"Jinli Zhong,&nbsp;Jianxun Zhang","doi":"10.1016/j.dsp.2025.105158","DOIUrl":null,"url":null,"abstract":"<div><div>The study of remote sensing detection of small targets is of great significance in the fields of traffic monitoring and military target localization and identification. However, due to small-scale targets occupying fewer pixels, they lack not only physical features such as texture and shape, but also the effective information is lost during forward propagation of the network, leading to inappropriate gradient updates, which in turn affects the accuracy of target detection. To this end, this paper introduces a Multi-order Gated Aggregation module based on Inverted Residual (MIR). This module, designed around the concept of manifolds of interest, effectively adapts to multi-scale variations and significantly mitigates information loss for small-scale targets. Furthermore, we take advantage of the complementary advantages of multi-modal information and design a multistage dual modality fusion framework, which significantly improves detection accuracy. To address the complexity and diversity of remote sensing scenes, this paper proposes a Gradient Path-Based Vision LSTM (GViL) block. This module employs the high efficiency of gradient path analysis and achieves significant results by leveraging the modeling capability of Vision LSTM (ViL) for contexts. We have verified the performance of the model in this paper on the multi-modal remote sensing datasets VEDAI and Dronevehicle, and achieved excellent results. On the VEDAI dataset, the <span><math><mtext>m</mtext><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mn>50</mn></mrow></msub></math></span> of our model increases by 5.8% over the basic model (YOLOv8s), and by 3.7% over this year's target detection state-of-the-art method, YOLOv9.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"162 ","pages":"Article 105158"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MIR-YOLO: Remote sensing small target detection network based on visible-infrared dual modality\",\"authors\":\"Jinli Zhong,&nbsp;Jianxun Zhang\",\"doi\":\"10.1016/j.dsp.2025.105158\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The study of remote sensing detection of small targets is of great significance in the fields of traffic monitoring and military target localization and identification. However, due to small-scale targets occupying fewer pixels, they lack not only physical features such as texture and shape, but also the effective information is lost during forward propagation of the network, leading to inappropriate gradient updates, which in turn affects the accuracy of target detection. To this end, this paper introduces a Multi-order Gated Aggregation module based on Inverted Residual (MIR). This module, designed around the concept of manifolds of interest, effectively adapts to multi-scale variations and significantly mitigates information loss for small-scale targets. Furthermore, we take advantage of the complementary advantages of multi-modal information and design a multistage dual modality fusion framework, which significantly improves detection accuracy. To address the complexity and diversity of remote sensing scenes, this paper proposes a Gradient Path-Based Vision LSTM (GViL) block. This module employs the high efficiency of gradient path analysis and achieves significant results by leveraging the modeling capability of Vision LSTM (ViL) for contexts. We have verified the performance of the model in this paper on the multi-modal remote sensing datasets VEDAI and Dronevehicle, and achieved excellent results. On the VEDAI dataset, the <span><math><mtext>m</mtext><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mn>50</mn></mrow></msub></math></span> of our model increases by 5.8% over the basic model (YOLOv8s), and by 3.7% over this year's target detection state-of-the-art method, YOLOv9.</div></div>\",\"PeriodicalId\":51011,\"journal\":{\"name\":\"Digital Signal Processing\",\"volume\":\"162 \",\"pages\":\"Article 105158\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1051200425001800\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425001800","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

小目标遥感检测的研究在交通监控和军事目标定位识别等领域具有重要意义。然而,由于小尺度目标占用的像素较少,不仅缺乏纹理、形状等物理特征,而且在网络的前向传播过程中会丢失有效信息,导致梯度更新不当,进而影响目标检测的精度。为此,本文提出了一种基于倒残差(MIR)的多阶门控聚合模块。该模块围绕感兴趣流形的概念设计,能够有效适应多尺度变化,显著减轻小尺度目标的信息丢失。利用多模态信息的互补优势,设计了多级双模态融合框架,显著提高了检测精度。针对遥感场景的复杂性和多样性,提出了一种基于梯度路径的视觉LSTM (GViL)块。该模块利用了高效的梯度路径分析方法,利用Vision LSTM (ViL)对上下文的建模能力,取得了显著的效果。我们在多模态遥感数据集VEDAI和Dronevehicle上验证了本文模型的性能,并取得了良好的效果。在VEDAI数据集上,我们的模型的mAP50比基本模型(YOLOv8s)提高了5.8%,比今年的目标检测最先进方法YOLOv9提高了3.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MIR-YOLO: Remote sensing small target detection network based on visible-infrared dual modality
The study of remote sensing detection of small targets is of great significance in the fields of traffic monitoring and military target localization and identification. However, due to small-scale targets occupying fewer pixels, they lack not only physical features such as texture and shape, but also the effective information is lost during forward propagation of the network, leading to inappropriate gradient updates, which in turn affects the accuracy of target detection. To this end, this paper introduces a Multi-order Gated Aggregation module based on Inverted Residual (MIR). This module, designed around the concept of manifolds of interest, effectively adapts to multi-scale variations and significantly mitigates information loss for small-scale targets. Furthermore, we take advantage of the complementary advantages of multi-modal information and design a multistage dual modality fusion framework, which significantly improves detection accuracy. To address the complexity and diversity of remote sensing scenes, this paper proposes a Gradient Path-Based Vision LSTM (GViL) block. This module employs the high efficiency of gradient path analysis and achieves significant results by leveraging the modeling capability of Vision LSTM (ViL) for contexts. We have verified the performance of the model in this paper on the multi-modal remote sensing datasets VEDAI and Dronevehicle, and achieved excellent results. On the VEDAI dataset, the mAP50 of our model increases by 5.8% over the basic model (YOLOv8s), and by 3.7% over this year's target detection state-of-the-art method, YOLOv9.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Digital Signal Processing
Digital Signal Processing 工程技术-工程:电子与电气
CiteScore
5.30
自引率
17.20%
发文量
435
审稿时长
66 days
期刊介绍: Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信