Jennifer MacIsaac , Stuart Newson , Adham Ashton-Butt , Huma Pearce , Ben Milner
{"title":"Improving acoustic species identification using data augmentation within a deep learning framework","authors":"Jennifer MacIsaac , Stuart Newson , Adham Ashton-Butt , Huma Pearce , Ben Milner","doi":"10.1016/j.ecoinf.2024.102851","DOIUrl":null,"url":null,"abstract":"<div><div>Convolutional neural networks (CNNs) are effective tools for acoustic classification tasks such as species identification. Large datasets of labelled recordings are required to develop CNN classifiers which can be difficult to obtain, particularly if species are rare or vocalise infrequently. Additionally, data often requires manual labelling which can be time consuming requiring expert analysis. Artificially generating data using augmentation can address these challenges, however the impact of data augmentation on CNN performance is poorly understood and often omitted in bioacoustic studies. Here, we empirically test the impact of CNN architecture and 20 data augmentation methods on classifier performance. We use acoustic identification of 18 small mammal species as a case study of a species group that can be effectively surveyed by acoustic monitoring, but recordings for training data are scarce and difficult to collect. Networks that achieved the highest accuracy across all sample sizes was a 10-layer CNN (96.43 %) and a pre-trained ResNet50 model (96.37 %). Overall, all augmentation effects improved ResNet50 model performance and 17 effects improved Conv10 performance, increasing relative change in accuracy (RCA) by 0.021–0.641. Three augmentation effects negatively impacted Conv10 RCA by −0.042 to −0.182. We also show that adding augmented data when the number of original samples is low has the greatest positive impact on accuracy and this effect was larger with ResNet50 models. Our work demonstrates that using data augmentation where few original samples are available can considerably improve model performance and highlights the potential of augmentation in developing acoustic classifiers for species where data are limited or difficult to obtain.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102851"},"PeriodicalIF":5.8000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954124003935","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Convolutional neural networks (CNNs) are effective tools for acoustic classification tasks such as species identification. Large datasets of labelled recordings are required to develop CNN classifiers which can be difficult to obtain, particularly if species are rare or vocalise infrequently. Additionally, data often requires manual labelling which can be time consuming requiring expert analysis. Artificially generating data using augmentation can address these challenges, however the impact of data augmentation on CNN performance is poorly understood and often omitted in bioacoustic studies. Here, we empirically test the impact of CNN architecture and 20 data augmentation methods on classifier performance. We use acoustic identification of 18 small mammal species as a case study of a species group that can be effectively surveyed by acoustic monitoring, but recordings for training data are scarce and difficult to collect. Networks that achieved the highest accuracy across all sample sizes was a 10-layer CNN (96.43 %) and a pre-trained ResNet50 model (96.37 %). Overall, all augmentation effects improved ResNet50 model performance and 17 effects improved Conv10 performance, increasing relative change in accuracy (RCA) by 0.021–0.641. Three augmentation effects negatively impacted Conv10 RCA by −0.042 to −0.182. We also show that adding augmented data when the number of original samples is low has the greatest positive impact on accuracy and this effect was larger with ResNet50 models. Our work demonstrates that using data augmentation where few original samples are available can considerably improve model performance and highlights the potential of augmentation in developing acoustic classifiers for species where data are limited or difficult to obtain.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.