{"title":"Bioacoustic augmentation of Orcas using TransGAN","authors":"Nishant Yella, Manisai Eppakayala, Tauqir Pasha","doi":"10.1109/IPAS55744.2022.10052983","DOIUrl":null,"url":null,"abstract":"The Southern Resident Killer Whale (Orcinus Orca) is an apex predator in the oceans. Currently, these are listed as endangered species and have slowly declined in number over the past two decades. There is a lack of availability of data on audio vocalizations of killer whales, which in itself creates a demanding task to acquire labelled audio sets. The vocalizations of orcas are usually categorized into two groups namely, whistles and pulsed calls. There is a significant amount of scarcity on audio sets of these two types of vocalizations. Hence this creates a challenge to address the lack of availability of data on these vocalizations. Methods of data augmentations have proven over the years to be very effective in generating synthetically created data for the use of labelled training of a given feed-forward neural network. The Transformer based Generative Adversarial neural network (Trans-GAN) has performed phenomenally well on tasks pertaining to visual perception. In this paper, we would like to demonstrate the use of trans-GAN on audio datasets, which would be used to perform bioacoustics augmentation of the killer whale audio vocalizations obtained from existing open-source libraries to generate a synthetically substantial amount of audio data on the killer whale vocalizations for tasks pertaining to audio perception. To validate the Trans-GAN generated audio to the original killer Whale vocalization sample, we have implemented a time-sequence-based algorithm called Dynamic Time Wrapping (DTW), which compares the similarity index between these two audio samples.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPAS55744.2022.10052983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Southern Resident Killer Whale (Orcinus Orca) is an apex predator in the oceans. Currently, these are listed as endangered species and have slowly declined in number over the past two decades. There is a lack of availability of data on audio vocalizations of killer whales, which in itself creates a demanding task to acquire labelled audio sets. The vocalizations of orcas are usually categorized into two groups namely, whistles and pulsed calls. There is a significant amount of scarcity on audio sets of these two types of vocalizations. Hence this creates a challenge to address the lack of availability of data on these vocalizations. Methods of data augmentations have proven over the years to be very effective in generating synthetically created data for the use of labelled training of a given feed-forward neural network. The Transformer based Generative Adversarial neural network (Trans-GAN) has performed phenomenally well on tasks pertaining to visual perception. In this paper, we would like to demonstrate the use of trans-GAN on audio datasets, which would be used to perform bioacoustics augmentation of the killer whale audio vocalizations obtained from existing open-source libraries to generate a synthetically substantial amount of audio data on the killer whale vocalizations for tasks pertaining to audio perception. To validate the Trans-GAN generated audio to the original killer Whale vocalization sample, we have implemented a time-sequence-based algorithm called Dynamic Time Wrapping (DTW), which compares the similarity index between these two audio samples.