{"title":"基于无人机图像的复杂环境下小目标检测的深度神经网络","authors":"Sayed Jobaer , Xue-song Tang , Yihong Zhang","doi":"10.1016/j.engappai.2025.110466","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning-based object detectors perform effectively on edge devices but encounter challenges with small and flat objects in complex environments, especially under low-light conditions and in high-altitude images captured by unmanned aerial vehicles (UAVs). The primary issue is the pixel similarity between objects and their backgrounds, making detection challenging. While existing detectors struggle to detect small and flat objects in these scenarios, the advent of you only look once (YOLO) algorithms have shown promise. However, they still have limitations in detecting small and flat objects under these conditions. Due to a shortage of suitable datasets covering complex environments and lighting conditions, the field lacks comprehensive research on detecting small and flat objects in UAV-assisted images. To address these issues, we develop a dataset with nine classes tailored to small object detection (SOD) challenges. We propose a dynamic model based on the you only look once network v5 (version 6.2) architecture to overcome the above-mentioned limitations. We introduce the Luna-enhancement mechanism and four novel modules, which enhance the detector's capacity to detect objects in complex environments. Our approach aims to improve the accuracy and robustness of detecting small and flat objects in complex environments, benefiting applications like aerial surveillance, search and rescue, and autonomous navigation. The experimental results demonstrate that our proposed model achieves a mean average precision (mAP_0.5) of 74.8% on the common objects in context (COCO) dataset, 76.3% on the VisDrone2019 dataset, 90.6% on the dataset for object detection in aerial images (DOTA-v1.5) dataset, and 71.5% on our SOD-Dataset, with improvements of 7.7%, 6.9%, 4.4% and 10.9%, respectively. For mAP_0.5:0.95, the model achieves 57.2%, 58.2%, 68.2%, and 51.7% on the COCO, VisDrone2019, DOTA-v1.5, and SOD-Dataset, with improvements of 5.5%, 16.4%, 3.4%, and 12.1% compared to the baseline algorithm. Furthermore, ablation experiments and visualization analysis provide additional evidence of the importance of each model component. The code and dataset are publicly available at <span><span>https://github.com/dhuvisionlab/YOLO-SOD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110466"},"PeriodicalIF":8.0000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A deep neural network for small object detection in complex environments with unmanned aerial vehicle imagery\",\"authors\":\"Sayed Jobaer , Xue-song Tang , Yihong Zhang\",\"doi\":\"10.1016/j.engappai.2025.110466\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep learning-based object detectors perform effectively on edge devices but encounter challenges with small and flat objects in complex environments, especially under low-light conditions and in high-altitude images captured by unmanned aerial vehicles (UAVs). The primary issue is the pixel similarity between objects and their backgrounds, making detection challenging. While existing detectors struggle to detect small and flat objects in these scenarios, the advent of you only look once (YOLO) algorithms have shown promise. However, they still have limitations in detecting small and flat objects under these conditions. Due to a shortage of suitable datasets covering complex environments and lighting conditions, the field lacks comprehensive research on detecting small and flat objects in UAV-assisted images. To address these issues, we develop a dataset with nine classes tailored to small object detection (SOD) challenges. We propose a dynamic model based on the you only look once network v5 (version 6.2) architecture to overcome the above-mentioned limitations. We introduce the Luna-enhancement mechanism and four novel modules, which enhance the detector's capacity to detect objects in complex environments. Our approach aims to improve the accuracy and robustness of detecting small and flat objects in complex environments, benefiting applications like aerial surveillance, search and rescue, and autonomous navigation. The experimental results demonstrate that our proposed model achieves a mean average precision (mAP_0.5) of 74.8% on the common objects in context (COCO) dataset, 76.3% on the VisDrone2019 dataset, 90.6% on the dataset for object detection in aerial images (DOTA-v1.5) dataset, and 71.5% on our SOD-Dataset, with improvements of 7.7%, 6.9%, 4.4% and 10.9%, respectively. For mAP_0.5:0.95, the model achieves 57.2%, 58.2%, 68.2%, and 51.7% on the COCO, VisDrone2019, DOTA-v1.5, and SOD-Dataset, with improvements of 5.5%, 16.4%, 3.4%, and 12.1% compared to the baseline algorithm. Furthermore, ablation experiments and visualization analysis provide additional evidence of the importance of each model component. The code and dataset are publicly available at <span><span>https://github.com/dhuvisionlab/YOLO-SOD</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"148 \",\"pages\":\"Article 110466\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S095219762500466X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095219762500466X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
A deep neural network for small object detection in complex environments with unmanned aerial vehicle imagery
Deep learning-based object detectors perform effectively on edge devices but encounter challenges with small and flat objects in complex environments, especially under low-light conditions and in high-altitude images captured by unmanned aerial vehicles (UAVs). The primary issue is the pixel similarity between objects and their backgrounds, making detection challenging. While existing detectors struggle to detect small and flat objects in these scenarios, the advent of you only look once (YOLO) algorithms have shown promise. However, they still have limitations in detecting small and flat objects under these conditions. Due to a shortage of suitable datasets covering complex environments and lighting conditions, the field lacks comprehensive research on detecting small and flat objects in UAV-assisted images. To address these issues, we develop a dataset with nine classes tailored to small object detection (SOD) challenges. We propose a dynamic model based on the you only look once network v5 (version 6.2) architecture to overcome the above-mentioned limitations. We introduce the Luna-enhancement mechanism and four novel modules, which enhance the detector's capacity to detect objects in complex environments. Our approach aims to improve the accuracy and robustness of detecting small and flat objects in complex environments, benefiting applications like aerial surveillance, search and rescue, and autonomous navigation. The experimental results demonstrate that our proposed model achieves a mean average precision (mAP_0.5) of 74.8% on the common objects in context (COCO) dataset, 76.3% on the VisDrone2019 dataset, 90.6% on the dataset for object detection in aerial images (DOTA-v1.5) dataset, and 71.5% on our SOD-Dataset, with improvements of 7.7%, 6.9%, 4.4% and 10.9%, respectively. For mAP_0.5:0.95, the model achieves 57.2%, 58.2%, 68.2%, and 51.7% on the COCO, VisDrone2019, DOTA-v1.5, and SOD-Dataset, with improvements of 5.5%, 16.4%, 3.4%, and 12.1% compared to the baseline algorithm. Furthermore, ablation experiments and visualization analysis provide additional evidence of the importance of each model component. The code and dataset are publicly available at https://github.com/dhuvisionlab/YOLO-SOD.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.