René Heinrich , Lukas Rauch , Bernhard Sick , Christoph Scholz
{"title":"AudioProtoPNet: An interpretable deep learning model for bird sound classification","authors":"René Heinrich , Lukas Rauch , Bernhard Sick , Christoph Scholz","doi":"10.1016/j.ecoinf.2025.103081","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning models have significantly advanced acoustic bird monitoring by recognizing numerous bird species based on their vocalizations. However, traditional deep learning models are black boxes that provide no insight into their underlying computations, limiting their usefulness to ornithologists and machine learning engineers. Explainable models could facilitate debugging, knowledge discovery, trust, and interdisciplinary collaboration. We introduce AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification. It is inherently interpretable, leveraging a ConvNeXt backbone to extract embeddings and a prototype learning classifier trained on these embeddings. The classifier learns prototypical patterns of each bird species’ vocalizations from spectrograms of instances in the training data. During inference, recordings are classified by comparing them to learned prototypes in the embedding space, providing explanations for the model’s decisions and insights into the most informative embeddings of each bird species. The model was trained on the BirdSet training dataset, which consists of 9734 bird species and over 6800 h of recordings. Its performance was evaluated on the seven BirdSet test datasets, covering different geographical regions. AudioProtoPNet outperformed the state-of-the-art bird sound classification model Perch, which is superior to the more popular BirdNet, achieving an average AUROC of 0.90 and a cmAP of 0.42, with relative improvements of 7.1% and 16.7% over Perch, respectively. These results demonstrate that even for the challenging task of multi-label bird sound classification, it is possible to develop powerful yet interpretable deep learning models that provide valuable insights for professionals in ornithology and machine learning.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"87 ","pages":"Article 103081"},"PeriodicalIF":5.8000,"publicationDate":"2025-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954125000901","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning models have significantly advanced acoustic bird monitoring by recognizing numerous bird species based on their vocalizations. However, traditional deep learning models are black boxes that provide no insight into their underlying computations, limiting their usefulness to ornithologists and machine learning engineers. Explainable models could facilitate debugging, knowledge discovery, trust, and interdisciplinary collaboration. We introduce AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification. It is inherently interpretable, leveraging a ConvNeXt backbone to extract embeddings and a prototype learning classifier trained on these embeddings. The classifier learns prototypical patterns of each bird species’ vocalizations from spectrograms of instances in the training data. During inference, recordings are classified by comparing them to learned prototypes in the embedding space, providing explanations for the model’s decisions and insights into the most informative embeddings of each bird species. The model was trained on the BirdSet training dataset, which consists of 9734 bird species and over 6800 h of recordings. Its performance was evaluated on the seven BirdSet test datasets, covering different geographical regions. AudioProtoPNet outperformed the state-of-the-art bird sound classification model Perch, which is superior to the more popular BirdNet, achieving an average AUROC of 0.90 and a cmAP of 0.42, with relative improvements of 7.1% and 16.7% over Perch, respectively. These results demonstrate that even for the challenging task of multi-label bird sound classification, it is possible to develop powerful yet interpretable deep learning models that provide valuable insights for professionals in ornithology and machine learning.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.