Bioacoustic augmentation of Orcas using TransGAN

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS) Pub Date : 2022-12-05 DOI:10.1109/IPAS55744.2022.10052983

Nishant Yella, Manisai Eppakayala, Tauqir Pasha

{"title":"Bioacoustic augmentation of Orcas using TransGAN","authors":"Nishant Yella, Manisai Eppakayala, Tauqir Pasha","doi":"10.1109/IPAS55744.2022.10052983","DOIUrl":null,"url":null,"abstract":"The Southern Resident Killer Whale (Orcinus Orca) is an apex predator in the oceans. Currently, these are listed as endangered species and have slowly declined in number over the past two decades. There is a lack of availability of data on audio vocalizations of killer whales, which in itself creates a demanding task to acquire labelled audio sets. The vocalizations of orcas are usually categorized into two groups namely, whistles and pulsed calls. There is a significant amount of scarcity on audio sets of these two types of vocalizations. Hence this creates a challenge to address the lack of availability of data on these vocalizations. Methods of data augmentations have proven over the years to be very effective in generating synthetically created data for the use of labelled training of a given feed-forward neural network. The Transformer based Generative Adversarial neural network (Trans-GAN) has performed phenomenally well on tasks pertaining to visual perception. In this paper, we would like to demonstrate the use of trans-GAN on audio datasets, which would be used to perform bioacoustics augmentation of the killer whale audio vocalizations obtained from existing open-source libraries to generate a synthetically substantial amount of audio data on the killer whale vocalizations for tasks pertaining to audio perception. To validate the Trans-GAN generated audio to the original killer Whale vocalization sample, we have implemented a time-sequence-based algorithm called Dynamic Time Wrapping (DTW), which compares the similarity index between these two audio samples.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPAS55744.2022.10052983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The Southern Resident Killer Whale (Orcinus Orca) is an apex predator in the oceans. Currently, these are listed as endangered species and have slowly declined in number over the past two decades. There is a lack of availability of data on audio vocalizations of killer whales, which in itself creates a demanding task to acquire labelled audio sets. The vocalizations of orcas are usually categorized into two groups namely, whistles and pulsed calls. There is a significant amount of scarcity on audio sets of these two types of vocalizations. Hence this creates a challenge to address the lack of availability of data on these vocalizations. Methods of data augmentations have proven over the years to be very effective in generating synthetically created data for the use of labelled training of a given feed-forward neural network. The Transformer based Generative Adversarial neural network (Trans-GAN) has performed phenomenally well on tasks pertaining to visual perception. In this paper, we would like to demonstrate the use of trans-GAN on audio datasets, which would be used to perform bioacoustics augmentation of the killer whale audio vocalizations obtained from existing open-source libraries to generate a synthetically substantial amount of audio data on the killer whale vocalizations for tasks pertaining to audio perception. To validate the Trans-GAN generated audio to the original killer Whale vocalization sample, we have implemented a time-sequence-based algorithm called Dynamic Time Wrapping (DTW), which compares the similarity index between these two audio samples.

查看原文本刊更多论文

利用TransGAN增强逆戟鲸的生物声学

南方虎鲸(Orcinus Orca)是海洋中的顶级掠食者。目前，它们被列为濒危物种，在过去的二十年里，它们的数量在缓慢下降。关于虎鲸发声的音频数据缺乏，这本身就创造了一项艰巨的任务，即获取有标签的音频集。逆戟鲸的叫声通常分为两类，即口哨声和脉冲叫声。这两种发声方式的音频集非常稀缺。因此，这对解决缺乏这些发声数据的问题提出了挑战。多年来，数据增强方法已被证明在生成用于给定前馈神经网络的标记训练的综合创建数据方面非常有效。基于Transformer的生成对抗神经网络(Trans-GAN)在与视觉感知相关的任务上表现得非常好。在本文中，我们想展示在音频数据集上使用trans-GAN，这将用于对从现有开源库中获得的虎鲸音频发声进行生物声学增强，以生成与音频感知相关的任务有关的虎鲸发声的合成大量音频数据。为了将Trans-GAN生成的音频与原始虎鲸发声样本进行验证，我们实现了一种基于时间序列的算法，称为动态时间包裹(DTW)，该算法比较了这两个音频样本之间的相似性指数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

自引率

0.00%

发文量