基于无人机图像的复杂环境下小目标检测的深度神经网络

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-03-08 DOI:10.1016/j.engappai.2025.110466

Sayed Jobaer , Xue-song Tang , Yihong Zhang

{"title":"基于无人机图像的复杂环境下小目标检测的深度神经网络","authors":"Sayed Jobaer , Xue-song Tang , Yihong Zhang","doi":"10.1016/j.engappai.2025.110466","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning-based object detectors perform effectively on edge devices but encounter challenges with small and flat objects in complex environments, especially under low-light conditions and in high-altitude images captured by unmanned aerial vehicles (UAVs). The primary issue is the pixel similarity between objects and their backgrounds, making detection challenging. While existing detectors struggle to detect small and flat objects in these scenarios, the advent of you only look once (YOLO) algorithms have shown promise. However, they still have limitations in detecting small and flat objects under these conditions. Due to a shortage of suitable datasets covering complex environments and lighting conditions, the field lacks comprehensive research on detecting small and flat objects in UAV-assisted images. To address these issues, we develop a dataset with nine classes tailored to small object detection (SOD) challenges. We propose a dynamic model based on the you only look once network v5 (version 6.2) architecture to overcome the above-mentioned limitations. We introduce the Luna-enhancement mechanism and four novel modules, which enhance the detector's capacity to detect objects in complex environments. Our approach aims to improve the accuracy and robustness of detecting small and flat objects in complex environments, benefiting applications like aerial surveillance, search and rescue, and autonomous navigation. The experimental results demonstrate that our proposed model achieves a mean average precision (mAP_0.5) of 74.8% on the common objects in context (COCO) dataset, 76.3% on the VisDrone2019 dataset, 90.6% on the dataset for object detection in aerial images (DOTA-v1.5) dataset, and 71.5% on our SOD-Dataset, with improvements of 7.7%, 6.9%, 4.4% and 10.9%, respectively. For mAP_0.5:0.95, the model achieves 57.2%, 58.2%, 68.2%, and 51.7% on the COCO, VisDrone2019, DOTA-v1.5, and SOD-Dataset, with improvements of 5.5%, 16.4%, 3.4%, and 12.1% compared to the baseline algorithm. Furthermore, ablation experiments and visualization analysis provide additional evidence of the importance of each model component. The code and dataset are publicly available at <span><span>https://github.com/dhuvisionlab/YOLO-SOD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110466"},"PeriodicalIF":8.0000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A deep neural network for small object detection in complex environments with unmanned aerial vehicle imagery\",\"authors\":\"Sayed Jobaer , Xue-song Tang , Yihong Zhang\",\"doi\":\"10.1016/j.engappai.2025.110466\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep learning-based object detectors perform effectively on edge devices but encounter challenges with small and flat objects in complex environments, especially under low-light conditions and in high-altitude images captured by unmanned aerial vehicles (UAVs). The primary issue is the pixel similarity between objects and their backgrounds, making detection challenging. While existing detectors struggle to detect small and flat objects in these scenarios, the advent of you only look once (YOLO) algorithms have shown promise. However, they still have limitations in detecting small and flat objects under these conditions. Due to a shortage of suitable datasets covering complex environments and lighting conditions, the field lacks comprehensive research on detecting small and flat objects in UAV-assisted images. To address these issues, we develop a dataset with nine classes tailored to small object detection (SOD) challenges. We propose a dynamic model based on the you only look once network v5 (version 6.2) architecture to overcome the above-mentioned limitations. We introduce the Luna-enhancement mechanism and four novel modules, which enhance the detector's capacity to detect objects in complex environments. Our approach aims to improve the accuracy and robustness of detecting small and flat objects in complex environments, benefiting applications like aerial surveillance, search and rescue, and autonomous navigation. The experimental results demonstrate that our proposed model achieves a mean average precision (mAP_0.5) of 74.8% on the common objects in context (COCO) dataset, 76.3% on the VisDrone2019 dataset, 90.6% on the dataset for object detection in aerial images (DOTA-v1.5) dataset, and 71.5% on our SOD-Dataset, with improvements of 7.7%, 6.9%, 4.4% and 10.9%, respectively. For mAP_0.5:0.95, the model achieves 57.2%, 58.2%, 68.2%, and 51.7% on the COCO, VisDrone2019, DOTA-v1.5, and SOD-Dataset, with improvements of 5.5%, 16.4%, 3.4%, and 12.1% compared to the baseline algorithm. Furthermore, ablation experiments and visualization analysis provide additional evidence of the importance of each model component. The code and dataset are publicly available at <span><span>https://github.com/dhuvisionlab/YOLO-SOD</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"148 \",\"pages\":\"Article 110466\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S095219762500466X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095219762500466X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

基于深度学习的物体检测器在边缘设备上表现有效，但在复杂环境中遇到小而扁平物体的挑战，特别是在低光条件下和无人机（uav）捕获的高空图像中。主要问题是物体及其背景之间的像素相似性，这使得检测具有挑战性。虽然现有的检测器很难在这些场景中检测到小而扁平的物体，但“只看一次”（YOLO）算法的出现显示出了希望。然而，在这些条件下，它们在检测小而扁平的物体方面仍然存在局限性。由于缺乏覆盖复杂环境和光照条件的合适数据集，该领域缺乏对无人机辅助图像中微小扁平物体检测的全面研究。为了解决这些问题，我们开发了一个包含九个类的数据集，专门针对小目标检测（SOD）挑战。我们提出了一个基于“只看一次”网络v5（版本6.2）架构的动态模型，以克服上述限制。我们介绍了月球增强机制和四个新的模块，增强了探测器在复杂环境中探测物体的能力。我们的方法旨在提高在复杂环境中检测小型和扁平物体的准确性和鲁棒性，有利于空中监视，搜索和救援以及自主导航等应用。实验结果表明，本文提出的模型在上下文中常见物体（COCO）数据集上的平均精度（mAP_0.5）为74.8%，在VisDrone2019数据集上为76.3%，在航空图像中物体检测数据集（DOTA-v1.5）上为90.6%，在我们的sod -数据集上为71.5%，分别提高了7.7%,6.9%，4.4%和10.9%。对于mAP_0.5:0.95，该模型在COCO、VisDrone2019、DOTA-v1.5和SOD-Dataset上的准确率分别达到57.2%、58.2%、68.2%和51.7%，分别比基线算法提高5.5%、16.4%、3.4%和12.1%。此外，消融实验和可视化分析为每个模型组成部分的重要性提供了额外的证据。代码和数据集可在https://github.com/dhuvisionlab/YOLO-SOD上公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A deep neural network for small object detection in complex environments with unmanned aerial vehicle imagery

Deep learning-based object detectors perform effectively on edge devices but encounter challenges with small and flat objects in complex environments, especially under low-light conditions and in high-altitude images captured by unmanned aerial vehicles (UAVs). The primary issue is the pixel similarity between objects and their backgrounds, making detection challenging. While existing detectors struggle to detect small and flat objects in these scenarios, the advent of you only look once (YOLO) algorithms have shown promise. However, they still have limitations in detecting small and flat objects under these conditions. Due to a shortage of suitable datasets covering complex environments and lighting conditions, the field lacks comprehensive research on detecting small and flat objects in UAV-assisted images. To address these issues, we develop a dataset with nine classes tailored to small object detection (SOD) challenges. We propose a dynamic model based on the you only look once network v5 (version 6.2) architecture to overcome the above-mentioned limitations. We introduce the Luna-enhancement mechanism and four novel modules, which enhance the detector's capacity to detect objects in complex environments. Our approach aims to improve the accuracy and robustness of detecting small and flat objects in complex environments, benefiting applications like aerial surveillance, search and rescue, and autonomous navigation. The experimental results demonstrate that our proposed model achieves a mean average precision (mAP_0.5) of 74.8% on the common objects in context (COCO) dataset, 76.3% on the VisDrone2019 dataset, 90.6% on the dataset for object detection in aerial images (DOTA-v1.5) dataset, and 71.5% on our SOD-Dataset, with improvements of 7.7%, 6.9%, 4.4% and 10.9%, respectively. For mAP_0.5:0.95, the model achieves 57.2%, 58.2%, 68.2%, and 51.7% on the COCO, VisDrone2019, DOTA-v1.5, and SOD-Dataset, with improvements of 5.5%, 16.4%, 3.4%, and 12.1% compared to the baseline algorithm. Furthermore, ablation experiments and visualization analysis provide additional evidence of the importance of each model component. The code and dataset are publicly available at https://github.com/dhuvisionlab/YOLO-SOD.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.