Detection of Big Animals on Images with Road Scenes using Deep Learning

2019 International Conference on Artificial Intelligence: Applications and Innovations (IC-AIAI) Pub Date : 2019-09-01 DOI:10.1109/IC-AIAI48757.2019.00028

D. Yudin, A. Sotnikov, A. Krishtopik

{"title":"Detection of Big Animals on Images with Road Scenes using Deep Learning","authors":"D. Yudin, A. Sotnikov, A. Krishtopik","doi":"10.1109/IC-AIAI48757.2019.00028","DOIUrl":null,"url":null,"abstract":"The recognition of big animals on the images with road scenes has received little attention in modern research. There are very few specialized data sets for this task. Popular open data sets contain many images of big animals, but the most part of them is not correspond to road scenes that is necessary for on-board vision systems of unmanned vehicles. The paper describes the preparation of such a specialized data set based on Google Open Images and COCO datasets. The resulting data set contains about 20000 images of big animals of 10 classes: \"Bear\", \"Fox\", \"Dog\", \"Horse\", \"Goat\", \"Sheep\", \"Cow\", \"Zebra\", \"Elephant\", \"Giraffe\". Deep learning approaches to detect these objects are researched in the paper. Authors trained and tested modern neural network architectures YOLOv3, RetinaNet R-50-FPN, Faster R-CNN R-50-FPN, Cascade R-CNN R-50-FPN. To compare the approaches the mean average precision (mAP) was determined at IoU≥50%, also their speed was calculated for input tensor sizes 640x384x3. The highest quality metrics are demonstrated by architecture YOLOv3 as for ten classes (0.78 mAP) and one joint class (0.92 mAP) detection with speed more 35 fps on NVidia Tesla V-100 32GB video card. At the same hardware, the RetinaNet R-50-FPN architecture provided recognition speed of more than 44 fps and a 13% lower mAP. The software implementation was done using the Keras and PyTorch deep learning libraries and NVidia CUDA technology. The proposed data set and neural network approach to recognizing big animals on images have shown their effectiveness and can be used in the on-board vision systems of driverless cars or in driver assistant systems.","PeriodicalId":374193,"journal":{"name":"2019 International Conference on Artificial Intelligence: Applications and Innovations (IC-AIAI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Artificial Intelligence: Applications and Innovations (IC-AIAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC-AIAI48757.2019.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

The recognition of big animals on the images with road scenes has received little attention in modern research. There are very few specialized data sets for this task. Popular open data sets contain many images of big animals, but the most part of them is not correspond to road scenes that is necessary for on-board vision systems of unmanned vehicles. The paper describes the preparation of such a specialized data set based on Google Open Images and COCO datasets. The resulting data set contains about 20000 images of big animals of 10 classes: "Bear", "Fox", "Dog", "Horse", "Goat", "Sheep", "Cow", "Zebra", "Elephant", "Giraffe". Deep learning approaches to detect these objects are researched in the paper. Authors trained and tested modern neural network architectures YOLOv3, RetinaNet R-50-FPN, Faster R-CNN R-50-FPN, Cascade R-CNN R-50-FPN. To compare the approaches the mean average precision (mAP) was determined at IoU≥50%, also their speed was calculated for input tensor sizes 640x384x3. The highest quality metrics are demonstrated by architecture YOLOv3 as for ten classes (0.78 mAP) and one joint class (0.92 mAP) detection with speed more 35 fps on NVidia Tesla V-100 32GB video card. At the same hardware, the RetinaNet R-50-FPN architecture provided recognition speed of more than 44 fps and a 13% lower mAP. The software implementation was done using the Keras and PyTorch deep learning libraries and NVidia CUDA technology. The proposed data set and neural network approach to recognizing big animals on images have shown their effectiveness and can be used in the on-board vision systems of driverless cars or in driver assistant systems.

查看原文本刊更多论文

基于深度学习的道路场景图像中大型动物的检测

道路场景图像中大型动物的识别在现代研究中很少受到重视。很少有专门用于此任务的数据集。目前流行的开放数据集包含了大量的大型动物图像，但其中大部分与无人驾驶车辆的车载视觉系统所必需的道路场景不相对应。本文描述了基于谷歌Open Images和COCO数据集的专门数据集的准备。结果数据集包含大约20000张大型动物的图像，分为10类:“熊”、“狐狸”、“狗”、“马”、“山羊”、“绵羊”、“牛”、“斑马”、“大象”、“长颈鹿”。本文研究了基于深度学习的目标检测方法。作者训练并测试了现代神经网络架构YOLOv3, RetinaNet R-50-FPN, Faster R-CNN R-50-FPN, Cascade R-CNN R-50-FPN。为了比较两种方法，在IoU≥50%时确定了平均精度(mAP)，并计算了输入张量大小为640x384x3时的速度。在NVidia Tesla V-100 32GB显卡上，YOLOv3架构的最高质量指标是10类(0.78 mAP)和一个联合类(0.92 mAP)检测，速度超过35 fps。在相同的硬件条件下，retanet R-50-FPN架构提供了超过44 fps的识别速度和低13%的mAP。软件实现使用Keras和PyTorch深度学习库和NVidia CUDA技术完成。所提出的数据集和神经网络方法在图像上识别大型动物已显示出其有效性，可用于无人驾驶汽车的车载视觉系统或驾驶员辅助系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Artificial Intelligence: Applications and Innovations (IC-AIAI)

自引率

0.00%

发文量