Pitch tracking of bird vocalizations and an automated process using YIN-bird

Colm O'Reilly, N. Harte
{"title":"Pitch tracking of bird vocalizations and an automated process using YIN-bird","authors":"Colm O'Reilly, N. Harte","doi":"10.1080/23312025.2017.1322025","DOIUrl":null,"url":null,"abstract":"Pitch or fundamental frequency is an important feature of bird song, from which scientists can learn much about a population. To use pitch as a feature, researchers need confidence in their pitch extraction system. Pitch detection algorithms (PDAs) proven to work on human speech may not be suitable for all types of bird vocalizations. This paper discusses pitch estimation performance on a variety of common bird vocalizations. The presence of multiple partials or tones simultaneously, extended frequency sweeps through multiple octaves, and rapid pitch modulations are just some of the difficulties encountered when estimating the pitch of bird song. Carefully tuned parameters improve pitch tracking with YIN, but optimal parameters can change quickly even within one song. YIN is a PDA which estimates pitch of human speech very well. This paper presents YIN-bird, a modified version of YIN which exploits spectrogram properties to automatically set a minimum fundamental frequency parameter for YIN. Gross pitch errors on whistles and trills were reduced by up to 4% on a ground truth data-set of synthetic bird song with known pitch. This data-set was evaluated by expert listeners and described as “sounding like original & can hardly tell it is synthetic”. A qualitative analysis showing YIN-bird not to be suitable for more complex bird vocalizations, such as nasals, is also presented.","PeriodicalId":10412,"journal":{"name":"Cogent Biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/23312025.2017.1322025","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cogent Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/23312025.2017.1322025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Pitch or fundamental frequency is an important feature of bird song, from which scientists can learn much about a population. To use pitch as a feature, researchers need confidence in their pitch extraction system. Pitch detection algorithms (PDAs) proven to work on human speech may not be suitable for all types of bird vocalizations. This paper discusses pitch estimation performance on a variety of common bird vocalizations. The presence of multiple partials or tones simultaneously, extended frequency sweeps through multiple octaves, and rapid pitch modulations are just some of the difficulties encountered when estimating the pitch of bird song. Carefully tuned parameters improve pitch tracking with YIN, but optimal parameters can change quickly even within one song. YIN is a PDA which estimates pitch of human speech very well. This paper presents YIN-bird, a modified version of YIN which exploits spectrogram properties to automatically set a minimum fundamental frequency parameter for YIN. Gross pitch errors on whistles and trills were reduced by up to 4% on a ground truth data-set of synthetic bird song with known pitch. This data-set was evaluated by expert listeners and described as “sounding like original & can hardly tell it is synthetic”. A qualitative analysis showing YIN-bird not to be suitable for more complex bird vocalizations, such as nasals, is also presented.
鸟类发声的音高跟踪和使用YIN-bird的自动化过程
音调或基本频率是鸟类鸣叫的一个重要特征,科学家可以从中了解一个种群的很多情况。为了使用音高作为特征,研究人员需要对他们的音高提取系统有信心。音调检测算法(pda)已被证明适用于人类语言,但可能并不适用于所有类型的鸟类发声。本文讨论了各种常见鸟类发声的音高估计性能。同时存在多个分音或音调,多个八度的扩展频率扫描,以及快速的音高调制只是估计鸟鸣音高时遇到的一些困难。精心调整的参数可以改善YIN的音高跟踪,但即使在一首歌中,最佳参数也会迅速改变。YIN是一种PDA,可以很好地估计人类说话的音高。本文提出了一种改进的YIN-bird,它利用谱图特性自动设置最小基频参数。在已知音高的合成鸟叫声的地面真实数据集上,口哨和颤音的总音高误差减少了4%。该数据集由专家听众评估,并被描述为“听起来像原创的,几乎看不出它是合成的”。定性分析表明阴鸟不适合更复杂的鸟类发声,如鼻音,也提出了。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Cogent Biology
Cogent Biology MULTIDISCIPLINARY SCIENCES-
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信