ORCA-WHISPER：一个使用深度学习的杀人鲸声音类型自动生成工具包

Interspeech Pub Date : 2022-09-18 DOI:10.21437/interspeech.2022-846

Christian Bergler, Alexander Barnhill, Dominik Perrin, M. Schmitt, A. Maier, E. Nöth

{"title":"ORCA-WHISPER：一个使用深度学习的杀人鲸声音类型自动生成工具包","authors":"Christian Bergler, Alexander Barnhill, Dominik Perrin, M. Schmitt, A. Maier, E. Nöth","doi":"10.21437/interspeech.2022-846","DOIUrl":null,"url":null,"abstract":"Even today, the current understanding and interpretation of animal-speciﬁc vocalization paradigms is largely based on his-torical and manual data analysis considering comparatively small data corpora, primarily because of time- and human-resource limitations, next to the scarcity of available species-related machine-learning techniques. Partial human-based data inspections neither represent the overall real-world vocal reper-toire, nor the variations within intra- and inter animal-speciﬁc call type portfolios, typically resulting only in small collections of category-speciﬁc ground truth data. Modern machine (deep) learning concepts are an essential requirement to identify sta-tistically signiﬁcant animal-related vocalization patterns within massive bioacoustic data archives. However, the applicability of pure supervised training approaches is challenging, due to limited call-speciﬁc ground truth data, combined with strong class-imbalances between individual call type events. The current study is the ﬁrst presenting a deep bioacoustic signal generation framework, entitled ORCA-WHISPER, a Generative Adversarial Network (GAN), trained on low-resource killer whale ( Orcinus Orca ) call type data. Besides audiovisual in-spection, supervised call type classiﬁcation, and model transferability, the auspicious quality of generated fake vocalizations was further demonstrated by visualizing, representing, and en-hancing the real-world orca signal data manifold. Moreover, previous orca/noise segmentation results were outperformed by integrating fake signals to the original data partition.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"2413-2417"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"ORCA-WHISPER: An Automatic Killer Whale Sound Type Generation Toolkit Using Deep Learning\",\"authors\":\"Christian Bergler, Alexander Barnhill, Dominik Perrin, M. Schmitt, A. Maier, E. Nöth\",\"doi\":\"10.21437/interspeech.2022-846\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Even today, the current understanding and interpretation of animal-speciﬁc vocalization paradigms is largely based on his-torical and manual data analysis considering comparatively small data corpora, primarily because of time- and human-resource limitations, next to the scarcity of available species-related machine-learning techniques. Partial human-based data inspections neither represent the overall real-world vocal reper-toire, nor the variations within intra- and inter animal-speciﬁc call type portfolios, typically resulting only in small collections of category-speciﬁc ground truth data. Modern machine (deep) learning concepts are an essential requirement to identify sta-tistically signiﬁcant animal-related vocalization patterns within massive bioacoustic data archives. However, the applicability of pure supervised training approaches is challenging, due to limited call-speciﬁc ground truth data, combined with strong class-imbalances between individual call type events. The current study is the ﬁrst presenting a deep bioacoustic signal generation framework, entitled ORCA-WHISPER, a Generative Adversarial Network (GAN), trained on low-resource killer whale ( Orcinus Orca ) call type data. Besides audiovisual in-spection, supervised call type classiﬁcation, and model transferability, the auspicious quality of generated fake vocalizations was further demonstrated by visualizing, representing, and en-hancing the real-world orca signal data manifold. Moreover, previous orca/noise segmentation results were outperformed by integrating fake signals to the original data partition.\",\"PeriodicalId\":73500,\"journal\":{\"name\":\"Interspeech\",\"volume\":\"1 1\",\"pages\":\"2413-2417\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interspeech\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/interspeech.2022-846\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

即使在今天，目前对动物物种发声范式的理解和解释也很大程度上是基于他的理论和手动数据分析，考虑到相对较小的数据语料库，主要是由于时间和人力资源的限制，以及可用的物种相关机器学习技术的稀缺性。部分基于人为的数据检查既不能代表整个真实世界的声乐曲目，也不能代表动物内部和动物间特定叫声类型组合的变化，通常只会产生少量类别特定的基本事实数据。现代机器（深度）学习概念是在大量生物声学数据档案中识别具有统计意义的动物相关发声模式的基本要求。然而，由于呼叫特定的基本事实数据有限，再加上单个呼叫类型事件之间的严重类不平衡，纯监督训练方法的适用性具有挑战性。目前的研究首次提出了一个名为ORCA-WHISPER的深度生物声学信号生成框架，这是一个基于低资源虎鲸（Orcinus ORCA）呼叫类型数据训练的生成对抗性网络（GAN）。除了视听检查、监督呼叫类型分类和模型可转移性外，通过可视化、表示和增强真实世界的虎鲸信号数据集，进一步证明了生成的假语音的良好质量。此外，通过将伪信号集成到原始数据分区中，先前的orca/噪声分割结果表现出色。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ORCA-WHISPER: An Automatic Killer Whale Sound Type Generation Toolkit Using Deep Learning

Even today, the current understanding and interpretation of animal-speciﬁc vocalization paradigms is largely based on his-torical and manual data analysis considering comparatively small data corpora, primarily because of time- and human-resource limitations, next to the scarcity of available species-related machine-learning techniques. Partial human-based data inspections neither represent the overall real-world vocal reper-toire, nor the variations within intra- and inter animal-speciﬁc call type portfolios, typically resulting only in small collections of category-speciﬁc ground truth data. Modern machine (deep) learning concepts are an essential requirement to identify sta-tistically signiﬁcant animal-related vocalization patterns within massive bioacoustic data archives. However, the applicability of pure supervised training approaches is challenging, due to limited call-speciﬁc ground truth data, combined with strong class-imbalances between individual call type events. The current study is the ﬁrst presenting a deep bioacoustic signal generation framework, entitled ORCA-WHISPER, a Generative Adversarial Network (GAN), trained on low-resource killer whale ( Orcinus Orca ) call type data. Besides audiovisual in-spection, supervised call type classiﬁcation, and model transferability, the auspicious quality of generated fake vocalizations was further demonstrated by visualizing, representing, and en-hancing the real-world orca signal data manifold. Moreover, previous orca/noise segmentation results were outperformed by integrating fake signals to the original data partition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Interspeech

自引率

0.00%

发文量