基于余弦相似度的mel谱图频率范围自动识别的少射生物声事件检测。

IF 2.1 2区物理与天体物理 Q2 ACOUSTICS

Journal of the Acoustical Society of America Pub Date : 2025-07-01 DOI:10.1121/10.0037080

Sheng-Lun Kao, Yi-Wen Liu

{"title":"基于余弦相似度的mel谱图频率范围自动识别的少射生物声事件检测。","authors":"Sheng-Lun Kao, Yi-Wen Liu","doi":"10.1121/10.0037080","DOIUrl":null,"url":null,"abstract":"Few-shot sound event detection (SED) has been an appealing idea in bioacoustics due to its potentials in reducing the labor of labeling the recordings to just a few positive examples. Since sounds can be represented as images in the time-frequency plane, existing few-shot-SED methods often borrow techniques from object detection in image processing, such as the prototypical networks. When applied to bioacoustic SED, however, prototypical networks encounter significant challenges. For instance, the main acoustic targets in a spectrogram are often small. The background is typically noisy, leading to overlap in frequency between positive and negative events. Furthermore, the positive events may be weak compared to the background. To overcome these difficulties, this study introduces an automatic frequency range identification algorithm, which is designed to handle small objects in Mel-spectrograms effectively. After the desired frequency range is identified, the system performs event detection by computing cosine similarity in a gliding-window manner. Overall, the system does not require a large amount of training data, or rely on pretrained models. An F-score of 46.9% was achieved on DCASE 2024 Task 5, placing it on top 3 and demonstrating its potential in addressing the unique challenges of bioacoustic SED.","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"158 1","pages":"123-134"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cosine similarity-based few-shot bioacoustic event detection with automatic frequency range identification in Mel-spectrograms.\",\"authors\":\"Sheng-Lun Kao, Yi-Wen Liu\",\"doi\":\"10.1121/10.0037080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Few-shot sound event detection (SED) has been an appealing idea in bioacoustics due to its potentials in reducing the labor of labeling the recordings to just a few positive examples. Since sounds can be represented as images in the time-frequency plane, existing few-shot-SED methods often borrow techniques from object detection in image processing, such as the prototypical networks. When applied to bioacoustic SED, however, prototypical networks encounter significant challenges. For instance, the main acoustic targets in a spectrogram are often small. The background is typically noisy, leading to overlap in frequency between positive and negative events. Furthermore, the positive events may be weak compared to the background. To overcome these difficulties, this study introduces an automatic frequency range identification algorithm, which is designed to handle small objects in Mel-spectrograms effectively. After the desired frequency range is identified, the system performs event detection by computing cosine similarity in a gliding-window manner. Overall, the system does not require a large amount of training data, or rely on pretrained models. An F-score of 46.9% was achieved on DCASE 2024 Task 5, placing it on top 3 and demonstrating its potential in addressing the unique challenges of bioacoustic SED.\",\"PeriodicalId\":17168,\"journal\":{\"name\":\"Journal of the Acoustical Society of America\",\"volume\":\"158 1\",\"pages\":\"123-134\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Acoustical Society of America\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0037080\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0037080","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

少射声事件检测（SED）在生物声学中一直是一个很有吸引力的想法，因为它可以减少标记录音的工作量，只留下几个积极的例子。由于声音可以在时频平面上表示为图像，因此现有的few-shot sed方法在图像处理中经常借用对象检测的技术，例如原型网络。然而，当应用于生物声学SED时，原型网络遇到了重大挑战。例如，频谱图中的主要声学目标通常很小。背景通常是嘈杂的，导致积极和消极事件之间的频率重叠。此外，与背景相比，积极事件可能是微弱的。为了克服这些困难，本研究引入了一种自动频率范围识别算法，旨在有效地处理mel谱图中的小目标。在确定所需的频率范围后，系统通过滑动窗口方式计算余弦相似度进行事件检测。总体而言，该系统不需要大量的训练数据，也不依赖于预训练模型。在DCASE 2024任务5中获得了46.9%的f分，排名前3，显示了其在解决生物声学SED的独特挑战方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cosine similarity-based few-shot bioacoustic event detection with automatic frequency range identification in Mel-spectrograms.

Few-shot sound event detection (SED) has been an appealing idea in bioacoustics due to its potentials in reducing the labor of labeling the recordings to just a few positive examples. Since sounds can be represented as images in the time-frequency plane, existing few-shot-SED methods often borrow techniques from object detection in image processing, such as the prototypical networks. When applied to bioacoustic SED, however, prototypical networks encounter significant challenges. For instance, the main acoustic targets in a spectrogram are often small. The background is typically noisy, leading to overlap in frequency between positive and negative events. Furthermore, the positive events may be weak compared to the background. To overcome these difficulties, this study introduces an automatic frequency range identification algorithm, which is designed to handle small objects in Mel-spectrograms effectively. After the desired frequency range is identified, the system performs event detection by computing cosine similarity in a gliding-window manner. Overall, the system does not require a large amount of training data, or rely on pretrained models. An F-score of 46.9% was achieved on DCASE 2024 Task 5, placing it on top 3 and demonstrating its potential in addressing the unique challenges of bioacoustic SED.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Acoustical Society of America 物理-声学

CiteScore

4.60

自引率

16.70%

发文量

1433

审稿时长

4.7 months

期刊介绍： Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.