Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline.

IF 2.1 4区 生物学 Q2 ZOOLOGY
Paul Best, Marcelo Araya-Salas, Axel G Ekström, Bárbara Freitas, Frants H Jensen, Arik Kershenbaum, Adriano R Lameira, Kenna D S Lehmann, Pavel Linhart, Robert C Liu, Malavika Madhavan, Andrew Markham, Marie A Roch, Holly Root-Gutteridge, Martin Šálek, Grace Smith-Vidaurre, Ariana Strandburg-Peshkin, Megan R Warren, Matthew Wijers, Ricard Marxer
{"title":"Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline.","authors":"Paul Best, Marcelo Araya-Salas, Axel G Ekström, Bárbara Freitas, Frants H Jensen, Arik Kershenbaum, Adriano R Lameira, Kenna D S Lehmann, Pavel Linhart, Robert C Liu, Malavika Madhavan, Andrew Markham, Marie A Roch, Holly Root-Gutteridge, Martin Šálek, Grace Smith-Vidaurre, Ariana Strandburg-Peshkin, Megan R Warren, Matthew Wijers, Ricard Marxer","doi":"10.1080/09524622.2025.2500380","DOIUrl":null,"url":null,"abstract":"<p><p>The fundamental frequency (F0) is a key parameter for characterising structures in vertebrate vocalisations, for instance defining vocal repertoires and their variations at different biological scales ( <i>e.g</i> population dialects, individual signatures). However, the task is too laborious to perform manually, and its automation is complex. Despite significant advancements in the fields of speech and music for automatic F0 estimation, similar progress in bioacoustics has been limited. To address this gap, we compile and publish a benchmark dataset of over 250,000 calls from 14 taxa, each paired with ground truth F0 values. These vocalisations range from infra-sounds to ultra-sounds, from high to low harmonicity, and some include non-linear phenomena. Testing different algorithms on these signals, we demonstrate the potential of neural networks for F0 estimation, even for taxa not seen in training, or when trained without labels. Also, to inform on the applicability of algorithms to analyse signals, we propose spectral measurements of F0 quality which correlate well with performance. While current performance results are not satisfying for all studied taxa, they suggest that deep learning could bring a more generic and reliable bioacoustic F0 tracker, helping the community to analyse vocalisations via their F0 contours.</p>","PeriodicalId":55385,"journal":{"name":"Bioacoustics-The International Journal of Animal Sound and Its Recording","volume":"34 4","pages":"419-446"},"PeriodicalIF":2.1000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387860/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioacoustics-The International Journal of Animal Sound and Its Recording","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1080/09524622.2025.2500380","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ZOOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The fundamental frequency (F0) is a key parameter for characterising structures in vertebrate vocalisations, for instance defining vocal repertoires and their variations at different biological scales ( e.g population dialects, individual signatures). However, the task is too laborious to perform manually, and its automation is complex. Despite significant advancements in the fields of speech and music for automatic F0 estimation, similar progress in bioacoustics has been limited. To address this gap, we compile and publish a benchmark dataset of over 250,000 calls from 14 taxa, each paired with ground truth F0 values. These vocalisations range from infra-sounds to ultra-sounds, from high to low harmonicity, and some include non-linear phenomena. Testing different algorithms on these signals, we demonstrate the potential of neural networks for F0 estimation, even for taxa not seen in training, or when trained without labels. Also, to inform on the applicability of algorithms to analyse signals, we propose spectral measurements of F0 quality which correlate well with performance. While current performance results are not satisfying for all studied taxa, they suggest that deep learning could bring a more generic and reliable bioacoustic F0 tracker, helping the community to analyse vocalisations via their F0 contours.

生物声学基频估计:跨物种数据集和深度学习基线。
基本频率(F0)是表征脊椎动物发声结构的关键参数,例如定义声音库及其在不同生物尺度上的变化(例如种群方言,个体特征)。然而,这项任务过于费力,无法手动执行,而且自动化也很复杂。尽管语音和音乐领域在自动F0估计方面取得了重大进展,但生物声学方面的类似进展有限。为了解决这一差距,我们编译并发布了一个基准数据集,其中包含来自14个分类群的超过250,000个呼叫,每个呼叫都与地面真实F0值配对。这些发声范围从次音到超音,从高谐波到低谐波,有些还包括非线性现象。我们在这些信号上测试了不同的算法,证明了神经网络对F0估计的潜力,即使是在训练中没有看到的分类群,或者在没有标签的情况下训练。此外,为了了解算法分析信号的适用性,我们提出了与性能良好相关的F0质量的频谱测量。虽然目前的性能结果并不能让所有研究的分类群都满意,但他们认为,深度学习可以带来更通用、更可靠的生物声学F0跟踪器,帮助群落通过F0轮廓分析发声。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.50
自引率
0.00%
发文量
25
审稿时长
>12 weeks
期刊介绍: Bioacoustics primarily publishes high-quality original research papers and reviews on sound communication in birds, mammals, amphibians, reptiles, fish, insects and other invertebrates, including the following topics : -Communication and related behaviour- Sound production- Hearing- Ontogeny and learning- Bioacoustics in taxonomy and systematics- Impacts of noise- Bioacoustics in environmental monitoring- Identification techniques and applications- Recording and analysis- Equipment and techniques- Ultrasound and infrasound- Underwater sound- Bioacoustical sound structures, patterns, variation and repertoires
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信