{"title":"Automating Bird Detection Based on Webcam Captured Images using Deep Learning","authors":"Alex Mirugwe, Juwa Nyirenda, Emmanuel Dufourq","doi":"10.29007/9fr5","DOIUrl":null,"url":null,"abstract":"One of the most challenging problems faced by ecologists and other biological re- searchers today is to analyze the massive amounts of data being collected by advanced monitoring systems like camera traps, wireless sensor networks, high-frequency radio track- ers, global positioning systems, and satellite tracking systems being used today. It has become expensive, laborious, and time-consuming to analyze this huge data using man- ual and traditional statistical techniques. Recent developments in the deep learning field are showing promising results towards automating the analysis of these extremely large datasets. The primary objective of this study was to test the capabilities of the state-of- the-art deep learning architectures to detect birds in the webcam captured images. A total of 10592 images were collected for this study from the Cornell Lab of Ornithology live stream feeds situated in six unique locations in United States, Ecuador, New Zealand, and Panama. To achieve the main objective of the study, we studied and evaluated two con- volutional neural network object detection meta-architectures, single-shot detector (SSD) and Faster R-CNN in combination with MobileNet-V2, ResNet50, ResNet101, ResNet152, and Inception ResNet-V2 feature extractors. Through transfer learning, all the models were initialized using weights pre-trained on the MS COCO (Microsoft Common Objects in Context) dataset provided by TensorFlow 2 object detection API. The Faster R-CNN model coupled with ResNet152 outperformed all other models with a mean average preci- sion of 92.3%. However, the SSD model with the MobileNet-V2 feature extraction network achieved the lowest inference time (110ms) and the smallest memory capacity (30.5MB) compared to its counterparts. The outstanding results achieved in this study confirm that deep learning-based algorithms are capable of detecting birds of different sizes in differ- ent environments and the best model could potentially help ecologists in monitoring and identifying birds from other species.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EPiC series in computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29007/9fr5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
One of the most challenging problems faced by ecologists and other biological re- searchers today is to analyze the massive amounts of data being collected by advanced monitoring systems like camera traps, wireless sensor networks, high-frequency radio track- ers, global positioning systems, and satellite tracking systems being used today. It has become expensive, laborious, and time-consuming to analyze this huge data using man- ual and traditional statistical techniques. Recent developments in the deep learning field are showing promising results towards automating the analysis of these extremely large datasets. The primary objective of this study was to test the capabilities of the state-of- the-art deep learning architectures to detect birds in the webcam captured images. A total of 10592 images were collected for this study from the Cornell Lab of Ornithology live stream feeds situated in six unique locations in United States, Ecuador, New Zealand, and Panama. To achieve the main objective of the study, we studied and evaluated two con- volutional neural network object detection meta-architectures, single-shot detector (SSD) and Faster R-CNN in combination with MobileNet-V2, ResNet50, ResNet101, ResNet152, and Inception ResNet-V2 feature extractors. Through transfer learning, all the models were initialized using weights pre-trained on the MS COCO (Microsoft Common Objects in Context) dataset provided by TensorFlow 2 object detection API. The Faster R-CNN model coupled with ResNet152 outperformed all other models with a mean average preci- sion of 92.3%. However, the SSD model with the MobileNet-V2 feature extraction network achieved the lowest inference time (110ms) and the smallest memory capacity (30.5MB) compared to its counterparts. The outstanding results achieved in this study confirm that deep learning-based algorithms are capable of detecting birds of different sizes in differ- ent environments and the best model could potentially help ecologists in monitoring and identifying birds from other species.