{"title":"MIR-YOLO:基于可见-红外双模态的遥感小目标检测网络","authors":"Jinli Zhong, Jianxun Zhang","doi":"10.1016/j.dsp.2025.105158","DOIUrl":null,"url":null,"abstract":"<div><div>The study of remote sensing detection of small targets is of great significance in the fields of traffic monitoring and military target localization and identification. However, due to small-scale targets occupying fewer pixels, they lack not only physical features such as texture and shape, but also the effective information is lost during forward propagation of the network, leading to inappropriate gradient updates, which in turn affects the accuracy of target detection. To this end, this paper introduces a Multi-order Gated Aggregation module based on Inverted Residual (MIR). This module, designed around the concept of manifolds of interest, effectively adapts to multi-scale variations and significantly mitigates information loss for small-scale targets. Furthermore, we take advantage of the complementary advantages of multi-modal information and design a multistage dual modality fusion framework, which significantly improves detection accuracy. To address the complexity and diversity of remote sensing scenes, this paper proposes a Gradient Path-Based Vision LSTM (GViL) block. This module employs the high efficiency of gradient path analysis and achieves significant results by leveraging the modeling capability of Vision LSTM (ViL) for contexts. We have verified the performance of the model in this paper on the multi-modal remote sensing datasets VEDAI and Dronevehicle, and achieved excellent results. On the VEDAI dataset, the <span><math><mtext>m</mtext><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mn>50</mn></mrow></msub></math></span> of our model increases by 5.8% over the basic model (YOLOv8s), and by 3.7% over this year's target detection state-of-the-art method, YOLOv9.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"162 ","pages":"Article 105158"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MIR-YOLO: Remote sensing small target detection network based on visible-infrared dual modality\",\"authors\":\"Jinli Zhong, Jianxun Zhang\",\"doi\":\"10.1016/j.dsp.2025.105158\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The study of remote sensing detection of small targets is of great significance in the fields of traffic monitoring and military target localization and identification. However, due to small-scale targets occupying fewer pixels, they lack not only physical features such as texture and shape, but also the effective information is lost during forward propagation of the network, leading to inappropriate gradient updates, which in turn affects the accuracy of target detection. To this end, this paper introduces a Multi-order Gated Aggregation module based on Inverted Residual (MIR). This module, designed around the concept of manifolds of interest, effectively adapts to multi-scale variations and significantly mitigates information loss for small-scale targets. Furthermore, we take advantage of the complementary advantages of multi-modal information and design a multistage dual modality fusion framework, which significantly improves detection accuracy. To address the complexity and diversity of remote sensing scenes, this paper proposes a Gradient Path-Based Vision LSTM (GViL) block. This module employs the high efficiency of gradient path analysis and achieves significant results by leveraging the modeling capability of Vision LSTM (ViL) for contexts. We have verified the performance of the model in this paper on the multi-modal remote sensing datasets VEDAI and Dronevehicle, and achieved excellent results. On the VEDAI dataset, the <span><math><mtext>m</mtext><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mn>50</mn></mrow></msub></math></span> of our model increases by 5.8% over the basic model (YOLOv8s), and by 3.7% over this year's target detection state-of-the-art method, YOLOv9.</div></div>\",\"PeriodicalId\":51011,\"journal\":{\"name\":\"Digital Signal Processing\",\"volume\":\"162 \",\"pages\":\"Article 105158\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1051200425001800\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425001800","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
MIR-YOLO: Remote sensing small target detection network based on visible-infrared dual modality
The study of remote sensing detection of small targets is of great significance in the fields of traffic monitoring and military target localization and identification. However, due to small-scale targets occupying fewer pixels, they lack not only physical features such as texture and shape, but also the effective information is lost during forward propagation of the network, leading to inappropriate gradient updates, which in turn affects the accuracy of target detection. To this end, this paper introduces a Multi-order Gated Aggregation module based on Inverted Residual (MIR). This module, designed around the concept of manifolds of interest, effectively adapts to multi-scale variations and significantly mitigates information loss for small-scale targets. Furthermore, we take advantage of the complementary advantages of multi-modal information and design a multistage dual modality fusion framework, which significantly improves detection accuracy. To address the complexity and diversity of remote sensing scenes, this paper proposes a Gradient Path-Based Vision LSTM (GViL) block. This module employs the high efficiency of gradient path analysis and achieves significant results by leveraging the modeling capability of Vision LSTM (ViL) for contexts. We have verified the performance of the model in this paper on the multi-modal remote sensing datasets VEDAI and Dronevehicle, and achieved excellent results. On the VEDAI dataset, the of our model increases by 5.8% over the basic model (YOLOv8s), and by 3.7% over this year's target detection state-of-the-art method, YOLOv9.
期刊介绍:
Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal.
The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as:
• big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,