Context-Aware Data Augmentation for Efficient Object Detection by UAV Surveillance

2022 10th International Symposium on Digital Forensics and Security (ISDFS) Pub Date : 2022-06-06 DOI:10.1109/ISDFS55398.2022.9800798

Yuri G. Gordienko, Oleksandr Rokovyi, Oleg Alienin, S. Stirenko

{"title":"Context-Aware Data Augmentation for Efficient Object Detection by UAV Surveillance","authors":"Yuri G. Gordienko, Oleksandr Rokovyi, Oleg Alienin, S. Stirenko","doi":"10.1109/ISDFS55398.2022.9800798","DOIUrl":null,"url":null,"abstract":"The problem of object detection by YOLOv4 deep neural network (DNN) is considered on Stanford drone dataset (SDD) with object classes (pedestrians, bicyclists, cars, skateboarders, golf carts, and buses) collected by Unmanned Aerial Vehicle (UAV) video surveillance. Some frames (images) with labels were extracted from videos of this dataset and structured in the open-access SDD frames (SDDF) version (https://www.kaggle.com/yoctoman/stanford-drone-dataset-frames). The context-aware data augmentation (CADA) was proposed to change bounding box (BB) sizes by some percentage of its width and height. To investigate the possible effect of the dataset labeling quality the \"dirty\" and \"clean\" dataset versions were prepared, which differ by the evaluation subset only. CADA procedures lead to significant improvement of performance by loss and mean average precision (mAP) that can be observed both for \"dirty\" and \"clean\" evaluation subsets in comparison to experiments without CADA. Moreover, CADA procedures allow to get the mAP values on the \"dirty\" (real) evaluation subset that can be similar (and for some classes higher even) to the mAP values on the \"clean\" (ground-truth - GT) evaluation subset without CADA procedures. This effect can be explained by increase of signal-to-noise ratios for object-to-background pairs after IN-like cropping CADA procedures and then by increase of variability of object-to-background pair after subsequent OUT-like enlarging CADA procedures. It should be noted the non-commutative nature of CADA-based retraining procedures because their reverse direction like first-OUT-then-IN CADA in contrast to first-IN-then-OUT CADA did not lead to such a big increase of mAP values. Several CADA-sequences were analyzed and the best strategy consists in first-IN-then-OUT CADA procedures, where the extent of decrease and increase of BBs width and height can be different for various applications and datasets.","PeriodicalId":114335,"journal":{"name":"2022 10th International Symposium on Digital Forensics and Security (ISDFS)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Symposium on Digital Forensics and Security (ISDFS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDFS55398.2022.9800798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The problem of object detection by YOLOv4 deep neural network (DNN) is considered on Stanford drone dataset (SDD) with object classes (pedestrians, bicyclists, cars, skateboarders, golf carts, and buses) collected by Unmanned Aerial Vehicle (UAV) video surveillance. Some frames (images) with labels were extracted from videos of this dataset and structured in the open-access SDD frames (SDDF) version (https://www.kaggle.com/yoctoman/stanford-drone-dataset-frames). The context-aware data augmentation (CADA) was proposed to change bounding box (BB) sizes by some percentage of its width and height. To investigate the possible effect of the dataset labeling quality the "dirty" and "clean" dataset versions were prepared, which differ by the evaluation subset only. CADA procedures lead to significant improvement of performance by loss and mean average precision (mAP) that can be observed both for "dirty" and "clean" evaluation subsets in comparison to experiments without CADA. Moreover, CADA procedures allow to get the mAP values on the "dirty" (real) evaluation subset that can be similar (and for some classes higher even) to the mAP values on the "clean" (ground-truth - GT) evaluation subset without CADA procedures. This effect can be explained by increase of signal-to-noise ratios for object-to-background pairs after IN-like cropping CADA procedures and then by increase of variability of object-to-background pair after subsequent OUT-like enlarging CADA procedures. It should be noted the non-commutative nature of CADA-based retraining procedures because their reverse direction like first-OUT-then-IN CADA in contrast to first-IN-then-OUT CADA did not lead to such a big increase of mAP values. Several CADA-sequences were analyzed and the best strategy consists in first-IN-then-OUT CADA procedures, where the extent of decrease and increase of BBs width and height can be different for various applications and datasets.

查看原文本刊更多论文

基于上下文感知数据增强的无人机监控高效目标检测

在斯坦福无人机数据集(SDD)上研究了YOLOv4深度神经网络(DNN)的目标检测问题，该数据集由无人机(UAV)视频监控收集对象类别(行人、骑自行车的人、汽车、滑板者、高尔夫球车和公共汽车)。从该数据集的视频中提取一些带有标签的帧(图像)，并在开放获取的SDD帧(SDDF)版本(https://www.kaggle.com/yoctoman/stanford-drone-dataset-frames)中进行结构化。提出了基于上下文感知的数据增强(CADA)方法，通过一定比例的宽度和高度来改变边界框(BB)的大小。为了研究数据集标记质量可能产生的影响，我们准备了“脏”和“干净”数据集版本，它们仅通过评估子集不同。与没有CADA的实验相比，CADA程序通过损失和平均平均精度(mAP)显著提高了性能，这可以在“脏”和“干净”评估子集中观察到。此外，CADA过程允许获得“脏”(真实)评估子集上的mAP值，这些值可能与没有CADA过程的“干净”(基真- GT)评估子集上的mAP值相似(对于某些类甚至更高)。这种效应可以解释为在IN-like裁剪CADA程序后，物体-背景对的信噪比增加，然后在随后的OUT-like放大CADA程序后，物体-背景对的变异性增加。值得注意的是，基于CADA的再训练过程的非交换性，因为它们的反向(如先出后入CADA与先入后出CADA相比)并没有导致mAP值的如此大的增加。对多个CADA序列进行了分析，发现最佳策略是先入后出CADA程序，其中bb宽度和高度的减小和增加程度可以根据不同的应用和数据集而不同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 10th International Symposium on Digital Forensics and Security (ISDFS)

自引率

0.00%

发文量