Machine learning for efficient segregation and labeling of potential biological sounds in long-term underwater recordings

Frontiers in Remote Sensing Pub Date : 2024-04-25 DOI:10.3389/frsen.2024.1390687

C. Parcerisas, Elena Schall, Kees te Velde, Dick Botteldooren, P. Devos, E. Debusschere

{"title":"Machine learning for efficient segregation and labeling of potential biological sounds in long-term underwater recordings","authors":"C. Parcerisas, Elena Schall, Kees te Velde, Dick Botteldooren, P. Devos, E. Debusschere","doi":"10.3389/frsen.2024.1390687","DOIUrl":null,"url":null,"abstract":"Studying marine soundscapes by detecting known sound events and quantifying their spatio-temporal patterns can provide ecologically relevant information. However, the exploration of underwater sound data to find and identify possible sound events of interest can be highly time-intensive for human analysts. To speed up this process, we propose a novel methodology that first detects all the potentially relevant acoustic events and then clusters them in an unsupervised way prior to manual revision. We demonstrate its applicability on a short deployment. To detect acoustic events, a deep learning object detection algorithm from computer vision (YOLOv8) is re-trained to detect any (short) acoustic event. This is done by converting the audio to spectrograms using sliding windows longer than the expected sound events of interest. The model detects any event present on that window and provides their time and frequency limits. With this approach, multiple events happening simultaneously can be detected. To further explore the possibilities to limit the human input needed to create the annotations to train the model, we propose an active learning approach to select the most informative audio files in an iterative manner for subsequent manual annotation. The obtained detection models are trained and tested on a dataset from the Belgian Part of the North Sea, and then further evaluated for robustness on a freshwater dataset from major European rivers. The proposed active learning approach outperforms the random selection of files, both in the marine and the freshwater datasets. Once the events are detected, they are converted to an embedded feature space using the BioLingual model, which is trained to classify different (biological) sounds. The obtained representations are then clustered in an unsupervised way, obtaining different sound classes. These classes are then manually revised. This method can be applied to unseen data as a tool to help bioacousticians identify recurrent sounds and save time when studying their spatio-temporal patterns. This reduces the time researchers need to go through long acoustic recordings and allows to conduct a more targeted analysis. It also provides a framework to monitor soundscapes regardless of whether the sound sources are known or not.","PeriodicalId":198378,"journal":{"name":"Frontiers in Remote Sensing","volume":"53 47","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Remote Sensing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frsen.2024.1390687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Studying marine soundscapes by detecting known sound events and quantifying their spatio-temporal patterns can provide ecologically relevant information. However, the exploration of underwater sound data to find and identify possible sound events of interest can be highly time-intensive for human analysts. To speed up this process, we propose a novel methodology that first detects all the potentially relevant acoustic events and then clusters them in an unsupervised way prior to manual revision. We demonstrate its applicability on a short deployment. To detect acoustic events, a deep learning object detection algorithm from computer vision (YOLOv8) is re-trained to detect any (short) acoustic event. This is done by converting the audio to spectrograms using sliding windows longer than the expected sound events of interest. The model detects any event present on that window and provides their time and frequency limits. With this approach, multiple events happening simultaneously can be detected. To further explore the possibilities to limit the human input needed to create the annotations to train the model, we propose an active learning approach to select the most informative audio files in an iterative manner for subsequent manual annotation. The obtained detection models are trained and tested on a dataset from the Belgian Part of the North Sea, and then further evaluated for robustness on a freshwater dataset from major European rivers. The proposed active learning approach outperforms the random selection of files, both in the marine and the freshwater datasets. Once the events are detected, they are converted to an embedded feature space using the BioLingual model, which is trained to classify different (biological) sounds. The obtained representations are then clustered in an unsupervised way, obtaining different sound classes. These classes are then manually revised. This method can be applied to unseen data as a tool to help bioacousticians identify recurrent sounds and save time when studying their spatio-temporal patterns. This reduces the time researchers need to go through long acoustic recordings and allows to conduct a more targeted analysis. It also provides a framework to monitor soundscapes regardless of whether the sound sources are known or not.

查看原文本刊更多论文

通过机器学习对长期水下录音中的潜在生物声音进行高效分离和标记

通过探测已知的声音事件并量化其时空模式来研究海洋声景，可以提供与生态相关的信息。然而，对于人类分析师来说，探索水下声音数据以发现和识别可能的声音事件需要耗费大量时间。为了加快这一过程，我们提出了一种新方法，首先检测所有潜在的相关声学事件，然后在人工修改之前以无监督的方式对它们进行聚类。我们在一次短期部署中演示了这种方法的适用性。为了检测声音事件，我们重新训练了计算机视觉的深度学习对象检测算法（YOLOv8），以检测任何（短）声音事件。具体方法是使用比预期声音事件更长的滑动窗口将音频转换为频谱图。该模型可检测出该窗口中出现的任何事件，并提供其时间和频率限制。通过这种方法，可以检测到同时发生的多个事件。为了进一步探索限制为训练模型而创建注释所需的人工输入的可能性，我们提出了一种主动学习方法，以迭代的方式选择信息量最大的音频文件，然后再进行人工注释。获得的检测模型在北海比利时部分的数据集上进行了训练和测试，然后在欧洲主要河流的淡水数据集上进行了进一步的鲁棒性评估。在海洋和淡水数据集上，所提出的主动学习方法优于随机选择文件的方法。一旦检测到事件，就会使用 BioLingual 模型将其转换为嵌入式特征空间，该模型经过训练可对不同的（生物）声音进行分类。然后以无监督的方式对所获得的表征进行聚类，从而得到不同的声音类别。然后对这些类别进行人工修正。这种方法可应用于未见过的数据，作为一种工具帮助生物声学人员识别重复出现的声音，并在研究其时空模式时节省时间。这减少了研究人员翻阅冗长声学记录所需的时间，并能进行更有针对性的分析。它还为监测声景提供了一个框架，无论声源是否已知。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Remote Sensing

自引率

0.00%

发文量