Alastair Pickering, Santiago Martinez Balvanera, Kate E. Jones, Daniela Hedwig
{"title":"一个可扩展的迁移学习工作流,用于从森林象的发声中提取生物学和行为学见解","authors":"Alastair Pickering, Santiago Martinez Balvanera, Kate E. Jones, Daniela Hedwig","doi":"10.1002/rse2.70008","DOIUrl":null,"url":null,"abstract":"Animal vocalizations encode rich biological information—such as age, sex, behavioural context and emotional state—making bioacoustic analysis a promising non‐invasive method for assessing welfare and population demography. However, traditional bioacoustic approaches, which rely on manually defined acoustic features, are time‐consuming, require specialized expertise and may introduce subjective bias. These constraints reduce the feasibility of analysing increasingly large datasets generated by passive acoustic monitoring (PAM). Transfer learning with Convolutional Neural Networks (CNNs) offers a scalable alternative by enabling automatic acoustic feature extraction without predefined criteria. Here, we applied four pre‐trained CNNs—two general purpose models (VGGish and YAMNet) and two avian bioacoustic models (Perch and BirdNET)—to African forest elephant (<jats:italic>Loxodonta cyclotis</jats:italic>) recordings. We used a dimensionality reduction algorithm (UMAP) to represent the extracted acoustic features in two dimensions and evaluated these representations across three key tasks: (1) call‐type classification (rumble, roar and trumpet), (2) rumble sub‐type identification and (3) behavioural and demographic analysis. A Random Forest classifier trained on these features achieved near‐perfect accuracy for rumbles, with Perch attaining the highest average accuracy (0.85) across all call types. Clustering the reduced features identified biologically meaningful rumble sub‐types—such as adult female calls linked to logistics—and provided clearer groupings than manual classification. Statistical analyses showed that factors including age and behavioural context significantly influenced call variation (<jats:italic>P</jats:italic> < 0.001), with additional comparisons revealing clear differences among contexts (e.g. nursing, competition, separation), sexes and multiple age classes. Perch and BirdNET consistently outperformed general purpose models when dealing with complex or ambiguous calls. These findings demonstrate that transfer learning enables scalable, reproducible bioacoustic workflows capable of detecting biologically meaningful acoustic variation. Integrating this approach into PAM pipelines can enhance the non‐invasive assessment of population dynamics, behaviour and welfare in acoustically active species.","PeriodicalId":21132,"journal":{"name":"Remote Sensing in Ecology and Conservation","volume":"219 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A scalable transfer learning workflow for extracting biological and behavioural insights from forest elephant vocalizations\",\"authors\":\"Alastair Pickering, Santiago Martinez Balvanera, Kate E. Jones, Daniela Hedwig\",\"doi\":\"10.1002/rse2.70008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Animal vocalizations encode rich biological information—such as age, sex, behavioural context and emotional state—making bioacoustic analysis a promising non‐invasive method for assessing welfare and population demography. However, traditional bioacoustic approaches, which rely on manually defined acoustic features, are time‐consuming, require specialized expertise and may introduce subjective bias. These constraints reduce the feasibility of analysing increasingly large datasets generated by passive acoustic monitoring (PAM). Transfer learning with Convolutional Neural Networks (CNNs) offers a scalable alternative by enabling automatic acoustic feature extraction without predefined criteria. Here, we applied four pre‐trained CNNs—two general purpose models (VGGish and YAMNet) and two avian bioacoustic models (Perch and BirdNET)—to African forest elephant (<jats:italic>Loxodonta cyclotis</jats:italic>) recordings. We used a dimensionality reduction algorithm (UMAP) to represent the extracted acoustic features in two dimensions and evaluated these representations across three key tasks: (1) call‐type classification (rumble, roar and trumpet), (2) rumble sub‐type identification and (3) behavioural and demographic analysis. A Random Forest classifier trained on these features achieved near‐perfect accuracy for rumbles, with Perch attaining the highest average accuracy (0.85) across all call types. Clustering the reduced features identified biologically meaningful rumble sub‐types—such as adult female calls linked to logistics—and provided clearer groupings than manual classification. Statistical analyses showed that factors including age and behavioural context significantly influenced call variation (<jats:italic>P</jats:italic> < 0.001), with additional comparisons revealing clear differences among contexts (e.g. nursing, competition, separation), sexes and multiple age classes. Perch and BirdNET consistently outperformed general purpose models when dealing with complex or ambiguous calls. These findings demonstrate that transfer learning enables scalable, reproducible bioacoustic workflows capable of detecting biologically meaningful acoustic variation. Integrating this approach into PAM pipelines can enhance the non‐invasive assessment of population dynamics, behaviour and welfare in acoustically active species.\",\"PeriodicalId\":21132,\"journal\":{\"name\":\"Remote Sensing in Ecology and Conservation\",\"volume\":\"219 1\",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Remote Sensing in Ecology and Conservation\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1002/rse2.70008\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing in Ecology and Conservation","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1002/rse2.70008","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
A scalable transfer learning workflow for extracting biological and behavioural insights from forest elephant vocalizations
Animal vocalizations encode rich biological information—such as age, sex, behavioural context and emotional state—making bioacoustic analysis a promising non‐invasive method for assessing welfare and population demography. However, traditional bioacoustic approaches, which rely on manually defined acoustic features, are time‐consuming, require specialized expertise and may introduce subjective bias. These constraints reduce the feasibility of analysing increasingly large datasets generated by passive acoustic monitoring (PAM). Transfer learning with Convolutional Neural Networks (CNNs) offers a scalable alternative by enabling automatic acoustic feature extraction without predefined criteria. Here, we applied four pre‐trained CNNs—two general purpose models (VGGish and YAMNet) and two avian bioacoustic models (Perch and BirdNET)—to African forest elephant (Loxodonta cyclotis) recordings. We used a dimensionality reduction algorithm (UMAP) to represent the extracted acoustic features in two dimensions and evaluated these representations across three key tasks: (1) call‐type classification (rumble, roar and trumpet), (2) rumble sub‐type identification and (3) behavioural and demographic analysis. A Random Forest classifier trained on these features achieved near‐perfect accuracy for rumbles, with Perch attaining the highest average accuracy (0.85) across all call types. Clustering the reduced features identified biologically meaningful rumble sub‐types—such as adult female calls linked to logistics—and provided clearer groupings than manual classification. Statistical analyses showed that factors including age and behavioural context significantly influenced call variation (P < 0.001), with additional comparisons revealing clear differences among contexts (e.g. nursing, competition, separation), sexes and multiple age classes. Perch and BirdNET consistently outperformed general purpose models when dealing with complex or ambiguous calls. These findings demonstrate that transfer learning enables scalable, reproducible bioacoustic workflows capable of detecting biologically meaningful acoustic variation. Integrating this approach into PAM pipelines can enhance the non‐invasive assessment of population dynamics, behaviour and welfare in acoustically active species.
期刊介绍:
emote Sensing in Ecology and Conservation provides a forum for rapid, peer-reviewed publication of novel, multidisciplinary research at the interface between remote sensing science and ecology and conservation. The journal prioritizes findings that advance the scientific basis of ecology and conservation, promoting the development of remote-sensing based methods relevant to the management of land use and biological systems at all levels, from populations and species to ecosystems and biomes. The journal defines remote sensing in its broadest sense, including data acquisition by hand-held and fixed ground-based sensors, such as camera traps and acoustic recorders, and sensors on airplanes and satellites. The intended journal’s audience includes ecologists, conservation scientists, policy makers, managers of terrestrial and aquatic systems, remote sensing scientists, and students.
Remote Sensing in Ecology and Conservation is a fully open access journal from Wiley and the Zoological Society of London. Remote sensing has enormous potential as to provide information on the state of, and pressures on, biological diversity and ecosystem services, at multiple spatial and temporal scales. This new publication provides a forum for multidisciplinary research in remote sensing science, ecological research and conservation science.