LERFNet: an enlarged effective receptive field backbone network for enhancing visual drone detection

The Visual Computer Pub Date : 2024-07-01 DOI:10.1007/s00371-024-03527-8

Mohamed Elsayed, Mohamed Reda, Ahmed S. Mashaly, Ahmed S. Amein

{"title":"LERFNet: an enlarged effective receptive field backbone network for enhancing visual drone detection","authors":"Mohamed Elsayed, Mohamed Reda, Ahmed S. Mashaly, Ahmed S. Amein","doi":"10.1007/s00371-024-03527-8","DOIUrl":null,"url":null,"abstract":"<p>Recently, the world has witnessed a great increase in drone applications and missions. Drones must be detected quickly, effectively, and precisely when they are being handled illegally. Vision-based anti-drone systems provide an efficient performance compared to radar- and acoustic-based systems. The effectiveness of drone detection is affected by a number of issues, including the drone’s small size, conflicts with other objects, and noisy backgrounds. This paper employs enlarging the effective receptive field (ERF) of feature maps generated from the YOLOv6 backbone. First, RepLKNet is used as the backbone of YOLOv6, which deploys large kernels with depth-wise convolution. Then, to get beyond RepLKNet’s large inference time, a novel LERFNet is implemented. LERFNet uses dilated convolution in addition to large kernels to enlarge the ERF and overcome each other’s problems. The linear spatial-channel attention module (LAM) is used to give more attention to the most informative pixels and high feature channels. LERFNet produces output feature maps with a large ERF and high shape bias to enhance the detection of various drone sizes in complex scenes. The RepLKNet and LERFNet backbones for Tiny-YOLOv6, Tiny-YOLOv6, YOLOv5s, and Tiny-YOLOv7 are compared. In comparison to the aforementioned techniques, the suggested model’s results show a greater balance between accuracy and speed. LERFNet increases the MAP by <span>\\(2.8\\%\\)</span>, while significantly reducing the GFLOPs and parameter numbers when compared to the original backbone of YOLOv6.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03527-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, the world has witnessed a great increase in drone applications and missions. Drones must be detected quickly, effectively, and precisely when they are being handled illegally. Vision-based anti-drone systems provide an efficient performance compared to radar- and acoustic-based systems. The effectiveness of drone detection is affected by a number of issues, including the drone’s small size, conflicts with other objects, and noisy backgrounds. This paper employs enlarging the effective receptive field (ERF) of feature maps generated from the YOLOv6 backbone. First, RepLKNet is used as the backbone of YOLOv6, which deploys large kernels with depth-wise convolution. Then, to get beyond RepLKNet’s large inference time, a novel LERFNet is implemented. LERFNet uses dilated convolution in addition to large kernels to enlarge the ERF and overcome each other’s problems. The linear spatial-channel attention module (LAM) is used to give more attention to the most informative pixels and high feature channels. LERFNet produces output feature maps with a large ERF and high shape bias to enhance the detection of various drone sizes in complex scenes. The RepLKNet and LERFNet backbones for Tiny-YOLOv6, Tiny-YOLOv6, YOLOv5s, and Tiny-YOLOv7 are compared. In comparison to the aforementioned techniques, the suggested model’s results show a greater balance between accuracy and speed. LERFNet increases the MAP by \(2.8\%\), while significantly reducing the GFLOPs and parameter numbers when compared to the original backbone of YOLOv6.

Abstract Image

查看原文本刊更多论文

LERFNet：用于增强视觉无人机探测的扩大有效感受野骨干网络

最近，全世界的无人机应用和任务大幅增加。当无人机被非法操控时，必须对其进行快速、有效和精确的检测。与基于雷达和声学的系统相比，基于视觉的反无人机系统具有更高效的性能。无人机检测的有效性受到一系列问题的影响，包括无人机的小尺寸、与其他物体的冲突以及嘈杂的背景。本文采用了扩大 YOLOv6 主干网生成的特征图的有效感受野（ERF）的方法。首先，将 RepLKNet 用作 YOLOv6 的骨干，它部署了深度卷积的大内核。然后，为了克服 RepLKNet 的庞大推理时间，我们实施了一个新颖的 LERFNet。LERFNet 除了使用大内核外，还使用了扩张卷积，以扩大 ERF 并克服彼此的问题。线性空间通道关注模块（LAM）用于更多地关注信息量最大的像素和高特征通道。LERFNet 生成的输出特征图具有较大的 ERF 和较高的形状偏置，可增强对复杂场景中各种大小无人机的检测。比较了 Tiny-YOLOv6、Tiny-YOLOv6、YOLOv5s 和 Tiny-YOLOv7 的 RepLKNet 和 LERFNet 主干网。与上述技术相比，建议模型的结果显示在准确性和速度之间取得了更好的平衡。与 YOLOv6 的原始骨干网相比，LERFNet 将 MAP 提高了（2.8%），同时大幅减少了 GFLOPs 和参数数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Visual Computer

自引率

0.00%

发文量