Improving acoustic species identification using data augmentation within a deep learning framework

IF 5.8 2区环境科学与生态学 Q1 ECOLOGY

Ecological Informatics Pub Date : 2024-10-15 DOI:10.1016/j.ecoinf.2024.102851

Jennifer MacIsaac , Stuart Newson , Adham Ashton-Butt , Huma Pearce , Ben Milner

{"title":"Improving acoustic species identification using data augmentation within a deep learning framework","authors":"Jennifer MacIsaac , Stuart Newson , Adham Ashton-Butt , Huma Pearce , Ben Milner","doi":"10.1016/j.ecoinf.2024.102851","DOIUrl":null,"url":null,"abstract":"<div><div>Convolutional neural networks (CNNs) are effective tools for acoustic classification tasks such as species identification. Large datasets of labelled recordings are required to develop CNN classifiers which can be difficult to obtain, particularly if species are rare or vocalise infrequently. Additionally, data often requires manual labelling which can be time consuming requiring expert analysis. Artificially generating data using augmentation can address these challenges, however the impact of data augmentation on CNN performance is poorly understood and often omitted in bioacoustic studies. Here, we empirically test the impact of CNN architecture and 20 data augmentation methods on classifier performance. We use acoustic identification of 18 small mammal species as a case study of a species group that can be effectively surveyed by acoustic monitoring, but recordings for training data are scarce and difficult to collect. Networks that achieved the highest accuracy across all sample sizes was a 10-layer CNN (96.43 %) and a pre-trained ResNet50 model (96.37 %). Overall, all augmentation effects improved ResNet50 model performance and 17 effects improved Conv10 performance, increasing relative change in accuracy (RCA) by 0.021–0.641. Three augmentation effects negatively impacted Conv10 RCA by −0.042 to −0.182. We also show that adding augmented data when the number of original samples is low has the greatest positive impact on accuracy and this effect was larger with ResNet50 models. Our work demonstrates that using data augmentation where few original samples are available can considerably improve model performance and highlights the potential of augmentation in developing acoustic classifiers for species where data are limited or difficult to obtain.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102851"},"PeriodicalIF":5.8000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954124003935","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural networks (CNNs) are effective tools for acoustic classification tasks such as species identification. Large datasets of labelled recordings are required to develop CNN classifiers which can be difficult to obtain, particularly if species are rare or vocalise infrequently. Additionally, data often requires manual labelling which can be time consuming requiring expert analysis. Artificially generating data using augmentation can address these challenges, however the impact of data augmentation on CNN performance is poorly understood and often omitted in bioacoustic studies. Here, we empirically test the impact of CNN architecture and 20 data augmentation methods on classifier performance. We use acoustic identification of 18 small mammal species as a case study of a species group that can be effectively surveyed by acoustic monitoring, but recordings for training data are scarce and difficult to collect. Networks that achieved the highest accuracy across all sample sizes was a 10-layer CNN (96.43 %) and a pre-trained ResNet50 model (96.37 %). Overall, all augmentation effects improved ResNet50 model performance and 17 effects improved Conv10 performance, increasing relative change in accuracy (RCA) by 0.021–0.641. Three augmentation effects negatively impacted Conv10 RCA by −0.042 to −0.182. We also show that adding augmented data when the number of original samples is low has the greatest positive impact on accuracy and this effect was larger with ResNet50 models. Our work demonstrates that using data augmentation where few original samples are available can considerably improve model performance and highlights the potential of augmentation in developing acoustic classifiers for species where data are limited or difficult to obtain.

查看原文本刊更多论文

在深度学习框架内利用数据扩增改进声学物种识别

卷积神经网络（CNN）是物种识别等声学分类任务的有效工具。开发卷积神经网络分类器需要大量标注录音的数据集，而这些数据集可能很难获得，尤其是在物种稀少或发声频率不高的情况下。此外，数据通常需要人工标注，这可能会耗费大量时间，需要专家进行分析。使用扩增技术人工生成数据可以解决这些难题，但人们对数据扩增对 CNN 性能的影响知之甚少，在生物声学研究中也经常被忽略。在此，我们实证测试了 CNN 架构和 20 种数据增强方法对分类器性能的影响。我们以 18 种小型哺乳动物的声学识别为案例，研究了可以通过声学监测进行有效调查，但用于训练数据的录音却很少且难以收集的物种群。在所有样本量中，准确率最高的网络是 10 层 CNN（96.43%）和预训练的 ResNet50 模型（96.37%）。总体而言，所有增强效果都提高了 ResNet50 模型的性能，17 种增强效果提高了 Conv10 的性能，使准确率相对变化 (RCA) 增加了 0.021-0.641。有三种增强效果对 Conv10 的 RCA 产生了负面影响，分别为-0.042 至-0.182。我们还表明，当原始样本数量较少时，添加增强数据对准确性的正面影响最大，而这种影响在 ResNet50 模型中更大。我们的研究表明，在原始样本较少的情况下使用数据增强可以大大提高模型的性能，并突出了数据增强在为数据有限或难以获得数据的物种开发声学分类器方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ecological Informatics 环境科学-生态学

CiteScore

8.30

自引率

11.80%

发文量

346

审稿时长

46 days

期刊介绍： The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change. The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.