ORCA-WHISPER:一个使用深度学习的杀人鲸声音类型自动生成工具包

Christian Bergler, Alexander Barnhill, Dominik Perrin, M. Schmitt, A. Maier, E. Nöth
{"title":"ORCA-WHISPER:一个使用深度学习的杀人鲸声音类型自动生成工具包","authors":"Christian Bergler, Alexander Barnhill, Dominik Perrin, M. Schmitt, A. Maier, E. Nöth","doi":"10.21437/interspeech.2022-846","DOIUrl":null,"url":null,"abstract":"Even today, the current understanding and interpretation of animal-specific vocalization paradigms is largely based on his-torical and manual data analysis considering comparatively small data corpora, primarily because of time- and human-resource limitations, next to the scarcity of available species-related machine-learning techniques. Partial human-based data inspections neither represent the overall real-world vocal reper-toire, nor the variations within intra- and inter animal-specific call type portfolios, typically resulting only in small collections of category-specific ground truth data. Modern machine (deep) learning concepts are an essential requirement to identify sta-tistically significant animal-related vocalization patterns within massive bioacoustic data archives. However, the applicability of pure supervised training approaches is challenging, due to limited call-specific ground truth data, combined with strong class-imbalances between individual call type events. The current study is the first presenting a deep bioacoustic signal generation framework, entitled ORCA-WHISPER, a Generative Adversarial Network (GAN), trained on low-resource killer whale ( Orcinus Orca ) call type data. Besides audiovisual in-spection, supervised call type classification, and model transferability, the auspicious quality of generated fake vocalizations was further demonstrated by visualizing, representing, and en-hancing the real-world orca signal data manifold. Moreover, previous orca/noise segmentation results were outperformed by integrating fake signals to the original data partition.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"2413-2417"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"ORCA-WHISPER: An Automatic Killer Whale Sound Type Generation Toolkit Using Deep Learning\",\"authors\":\"Christian Bergler, Alexander Barnhill, Dominik Perrin, M. Schmitt, A. Maier, E. Nöth\",\"doi\":\"10.21437/interspeech.2022-846\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Even today, the current understanding and interpretation of animal-specific vocalization paradigms is largely based on his-torical and manual data analysis considering comparatively small data corpora, primarily because of time- and human-resource limitations, next to the scarcity of available species-related machine-learning techniques. Partial human-based data inspections neither represent the overall real-world vocal reper-toire, nor the variations within intra- and inter animal-specific call type portfolios, typically resulting only in small collections of category-specific ground truth data. Modern machine (deep) learning concepts are an essential requirement to identify sta-tistically significant animal-related vocalization patterns within massive bioacoustic data archives. However, the applicability of pure supervised training approaches is challenging, due to limited call-specific ground truth data, combined with strong class-imbalances between individual call type events. The current study is the first presenting a deep bioacoustic signal generation framework, entitled ORCA-WHISPER, a Generative Adversarial Network (GAN), trained on low-resource killer whale ( Orcinus Orca ) call type data. Besides audiovisual in-spection, supervised call type classification, and model transferability, the auspicious quality of generated fake vocalizations was further demonstrated by visualizing, representing, and en-hancing the real-world orca signal data manifold. Moreover, previous orca/noise segmentation results were outperformed by integrating fake signals to the original data partition.\",\"PeriodicalId\":73500,\"journal\":{\"name\":\"Interspeech\",\"volume\":\"1 1\",\"pages\":\"2413-2417\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interspeech\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/interspeech.2022-846\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

即使在今天,目前对动物物种发声范式的理解和解释也很大程度上是基于他的理论和手动数据分析,考虑到相对较小的数据语料库,主要是由于时间和人力资源的限制,以及可用的物种相关机器学习技术的稀缺性。部分基于人为的数据检查既不能代表整个真实世界的声乐曲目,也不能代表动物内部和动物间特定叫声类型组合的变化,通常只会产生少量类别特定的基本事实数据。现代机器(深度)学习概念是在大量生物声学数据档案中识别具有统计意义的动物相关发声模式的基本要求。然而,由于呼叫特定的基本事实数据有限,再加上单个呼叫类型事件之间的严重类不平衡,纯监督训练方法的适用性具有挑战性。目前的研究首次提出了一个名为ORCA-WHISPER的深度生物声学信号生成框架,这是一个基于低资源虎鲸(Orcinus ORCA)呼叫类型数据训练的生成对抗性网络(GAN)。除了视听检查、监督呼叫类型分类和模型可转移性外,通过可视化、表示和增强真实世界的虎鲸信号数据集,进一步证明了生成的假语音的良好质量。此外,通过将伪信号集成到原始数据分区中,先前的orca/噪声分割结果表现出色。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ORCA-WHISPER: An Automatic Killer Whale Sound Type Generation Toolkit Using Deep Learning
Even today, the current understanding and interpretation of animal-specific vocalization paradigms is largely based on his-torical and manual data analysis considering comparatively small data corpora, primarily because of time- and human-resource limitations, next to the scarcity of available species-related machine-learning techniques. Partial human-based data inspections neither represent the overall real-world vocal reper-toire, nor the variations within intra- and inter animal-specific call type portfolios, typically resulting only in small collections of category-specific ground truth data. Modern machine (deep) learning concepts are an essential requirement to identify sta-tistically significant animal-related vocalization patterns within massive bioacoustic data archives. However, the applicability of pure supervised training approaches is challenging, due to limited call-specific ground truth data, combined with strong class-imbalances between individual call type events. The current study is the first presenting a deep bioacoustic signal generation framework, entitled ORCA-WHISPER, a Generative Adversarial Network (GAN), trained on low-resource killer whale ( Orcinus Orca ) call type data. Besides audiovisual in-spection, supervised call type classification, and model transferability, the auspicious quality of generated fake vocalizations was further demonstrated by visualizing, representing, and en-hancing the real-world orca signal data manifold. Moreover, previous orca/noise segmentation results were outperformed by integrating fake signals to the original data partition.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信