C. Chalmers, P. Fergus, C. C. Montañez, S. Longmore, S. Wich
{"title":"Video analysis for the detection of animals using convolutional neural networks and consumer-grade drones","authors":"C. Chalmers, P. Fergus, C. C. Montañez, S. Longmore, S. Wich","doi":"10.1139/JUVS-2020-0018","DOIUrl":null,"url":null,"abstract":"Determining animal distribution and density is important in conservation. The process is both time-consuming and labour-intensive. Drones have been used to help mitigate human-intensive tasks by covering large geographical areas over a much shorter timescale. In this paper we investigate this idea further using a proof of concept to detect rhinos and cars from drone footage. The proof of concept utilises off-the-shelf technology and consumer-grade drone hardware. The study demonstrates the feasibility of using machine learning (ML) to automate routine conservation tasks, such as animal detection and tracking. The prototype has been developed using a DJI Mavic Pro 2 and tested over a global system for mobile communications (GSM) network. The Faster-RCNN Resnet 101 architecture is used for transfer learning. Inference is performed with a frame sampling technique to address the required trade-off between precision, processing speed, and live video feed synchronisation. Inference models are hosted on a web platform and video streams from the drone (using OcuSync) are transmitted to a real-time messaging protocol (RTMP) server for subsequent classification. During training, the best model achieves a mean average precision (mAP) of 0.83 intersection over union (@IOU) 0.50 and 0.69 @IOU 0.75, respectively. On testing the system in Knowsley Safari our prototype was able to achieve the following: sensitivity (Sen), 0.91 (0.869, 0.94); specificity (Spec), 0.78 (0.74, 0.82); and an accuracy (ACC), 0.84 (0.81, 0.87) when detecting rhinos, and Sen, 1.00 (1.00, 1.00); Spec, 1.00 (1.00, 1.00); and an ACC, 1.00 (1.00, 1.00) when detecting cars.","PeriodicalId":45619,"journal":{"name":"Journal of Unmanned Vehicle Systems","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Unmanned Vehicle Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1139/JUVS-2020-0018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 6
Abstract
Determining animal distribution and density is important in conservation. The process is both time-consuming and labour-intensive. Drones have been used to help mitigate human-intensive tasks by covering large geographical areas over a much shorter timescale. In this paper we investigate this idea further using a proof of concept to detect rhinos and cars from drone footage. The proof of concept utilises off-the-shelf technology and consumer-grade drone hardware. The study demonstrates the feasibility of using machine learning (ML) to automate routine conservation tasks, such as animal detection and tracking. The prototype has been developed using a DJI Mavic Pro 2 and tested over a global system for mobile communications (GSM) network. The Faster-RCNN Resnet 101 architecture is used for transfer learning. Inference is performed with a frame sampling technique to address the required trade-off between precision, processing speed, and live video feed synchronisation. Inference models are hosted on a web platform and video streams from the drone (using OcuSync) are transmitted to a real-time messaging protocol (RTMP) server for subsequent classification. During training, the best model achieves a mean average precision (mAP) of 0.83 intersection over union (@IOU) 0.50 and 0.69 @IOU 0.75, respectively. On testing the system in Knowsley Safari our prototype was able to achieve the following: sensitivity (Sen), 0.91 (0.869, 0.94); specificity (Spec), 0.78 (0.74, 0.82); and an accuracy (ACC), 0.84 (0.81, 0.87) when detecting rhinos, and Sen, 1.00 (1.00, 1.00); Spec, 1.00 (1.00, 1.00); and an ACC, 1.00 (1.00, 1.00) when detecting cars.