Small Object Detection Based on Microscale Perception and Enhancement-Location Feature Pyramid

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-03-07 DOI:10.1109/TCDS.2024.3397684

Guang Han;Chenwei Guo;Ziyang Li;Haitao Zhao

{"title":"Small Object Detection Based on Microscale Perception and Enhancement-Location Feature Pyramid","authors":"Guang Han;Chenwei Guo;Ziyang Li;Haitao Zhao","doi":"10.1109/TCDS.2024.3397684","DOIUrl":null,"url":null,"abstract":"Due to the large number of small objects, significant scale variation, and uneven distribution in images captured by unmanned aerial vehicles (UAVs), existing algorithms have high rates of missing and false detections of small objects in drone images. A new object detection algorithm based on microscale perception and enhancement-location feature pyramid is proposed in this article. The microscale perception module alternatives the original convolution module in backbone, changing the receptive field through two dilation branches with various dilation rates and an adjustment switch branch. To better match the size and shape of sampled targets, the weighted deformable convolution is employed. The enhancement-location feature pyramid module aggregates the features from each layer to obtain balanced semantic information and refines aggregated features to enhance their ability to represent features. Moreover, a bottom-up branch structure is added to utilize the property of lower layer features being beneficial to locating small objects to enhance the localization ability for small objects. Additionally, by using specific image cropping and combining techniques, the target distribution of the training data is altered to make the model more sensitive to small objects and improving its robustness. Finally, a sample balance strategy is used in combination with focal loss and a sample extraction control method to balance simple hard sample imbalance and the long-tail distribution of interclass sample imbalance during training. Experimental results show that the proposed algorithm achieves a mean average precision of 35.9% on the VisDrone2019 dataset, which is a 14.2% improvement over the baseline Cascade RCNN and demonstrates better performance in detecting small objects in drone images. Compared with advanced algorithms in recent years, it also achieves state-of-the-art detection accuracy.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1982-1996"},"PeriodicalIF":5.0000,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive and Developmental Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10521894/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Due to the large number of small objects, significant scale variation, and uneven distribution in images captured by unmanned aerial vehicles (UAVs), existing algorithms have high rates of missing and false detections of small objects in drone images. A new object detection algorithm based on microscale perception and enhancement-location feature pyramid is proposed in this article. The microscale perception module alternatives the original convolution module in backbone, changing the receptive field through two dilation branches with various dilation rates and an adjustment switch branch. To better match the size and shape of sampled targets, the weighted deformable convolution is employed. The enhancement-location feature pyramid module aggregates the features from each layer to obtain balanced semantic information and refines aggregated features to enhance their ability to represent features. Moreover, a bottom-up branch structure is added to utilize the property of lower layer features being beneficial to locating small objects to enhance the localization ability for small objects. Additionally, by using specific image cropping and combining techniques, the target distribution of the training data is altered to make the model more sensitive to small objects and improving its robustness. Finally, a sample balance strategy is used in combination with focal loss and a sample extraction control method to balance simple hard sample imbalance and the long-tail distribution of interclass sample imbalance during training. Experimental results show that the proposed algorithm achieves a mean average precision of 35.9% on the VisDrone2019 dataset, which is a 14.2% improvement over the baseline Cascade RCNN and demonstrates better performance in detecting small objects in drone images. Compared with advanced algorithms in recent years, it also achieves state-of-the-art detection accuracy.

查看原文本刊更多论文

基于微尺度感知和增强的小物体检测--位置特征金字塔

由于无人机捕获的图像中小目标数量多、尺度变化大、分布不均匀，现有算法对无人机图像中的小目标存在较高的漏检率和误检率。提出了一种新的基于微尺度感知和增强的目标检测算法——定位特征金字塔。微尺度感知模块替代原有的主干卷积模块，通过两个不同扩张速率的扩张分支和一个调节开关分支改变感受野。为了更好地匹配采样目标的大小和形状，采用了加权可变形卷积。增强-位置特征金字塔模块对各层特征进行聚合，获得均衡的语义信息，并对聚合特征进行细化，增强特征表示能力。此外，利用底层特征有利于小目标定位的特性，增加了自底向上的分支结构，增强了对小目标的定位能力。此外，通过使用特定的图像裁剪和组合技术，改变训练数据的目标分布，使模型对小目标更加敏感，提高了模型的鲁棒性。最后，结合焦点损失和样本提取控制方法，采用样本平衡策略来平衡训练过程中简单硬样本不平衡和类间样本不平衡的长尾分布。实验结果表明，该算法在VisDrone2019数据集上的平均精度为35.9%，比基线Cascade RCNN提高了14.2%，在无人机图像中的小目标检测方面表现出更好的性能。与近年来的先进算法相比，它也达到了最先进的检测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Cognitive and Developmental Systems Computer Science-Software

CiteScore

7.20

自引率

10.00%

发文量

170

期刊介绍： The IEEE Transactions on Cognitive and Developmental Systems (TCDS) focuses on advances in the study of development and cognition in natural (humans, animals) and artificial (robots, agents) systems. It welcomes contributions from multiple related disciplines including cognitive systems, cognitive robotics, developmental and epigenetic robotics, autonomous and evolutionary robotics, social structures, multi-agent and artificial life systems, computational neuroscience, and developmental psychology. Articles on theoretical, computational, application-oriented, and experimental studies as well as reviews in these areas are considered.