{"title":"基于余弦相似度的mel谱图频率范围自动识别的少射生物声事件检测。","authors":"Sheng-Lun Kao, Yi-Wen Liu","doi":"10.1121/10.0037080","DOIUrl":null,"url":null,"abstract":"<p><p>Few-shot sound event detection (SED) has been an appealing idea in bioacoustics due to its potentials in reducing the labor of labeling the recordings to just a few positive examples. Since sounds can be represented as images in the time-frequency plane, existing few-shot-SED methods often borrow techniques from object detection in image processing, such as the prototypical networks. When applied to bioacoustic SED, however, prototypical networks encounter significant challenges. For instance, the main acoustic targets in a spectrogram are often small. The background is typically noisy, leading to overlap in frequency between positive and negative events. Furthermore, the positive events may be weak compared to the background. To overcome these difficulties, this study introduces an automatic frequency range identification algorithm, which is designed to handle small objects in Mel-spectrograms effectively. After the desired frequency range is identified, the system performs event detection by computing cosine similarity in a gliding-window manner. Overall, the system does not require a large amount of training data, or rely on pretrained models. An F-score of 46.9% was achieved on DCASE 2024 Task 5, placing it on top 3 and demonstrating its potential in addressing the unique challenges of bioacoustic SED.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"158 1","pages":"123-134"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cosine similarity-based few-shot bioacoustic event detection with automatic frequency range identification in Mel-spectrograms.\",\"authors\":\"Sheng-Lun Kao, Yi-Wen Liu\",\"doi\":\"10.1121/10.0037080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Few-shot sound event detection (SED) has been an appealing idea in bioacoustics due to its potentials in reducing the labor of labeling the recordings to just a few positive examples. Since sounds can be represented as images in the time-frequency plane, existing few-shot-SED methods often borrow techniques from object detection in image processing, such as the prototypical networks. When applied to bioacoustic SED, however, prototypical networks encounter significant challenges. For instance, the main acoustic targets in a spectrogram are often small. The background is typically noisy, leading to overlap in frequency between positive and negative events. Furthermore, the positive events may be weak compared to the background. To overcome these difficulties, this study introduces an automatic frequency range identification algorithm, which is designed to handle small objects in Mel-spectrograms effectively. After the desired frequency range is identified, the system performs event detection by computing cosine similarity in a gliding-window manner. Overall, the system does not require a large amount of training data, or rely on pretrained models. An F-score of 46.9% was achieved on DCASE 2024 Task 5, placing it on top 3 and demonstrating its potential in addressing the unique challenges of bioacoustic SED.</p>\",\"PeriodicalId\":17168,\"journal\":{\"name\":\"Journal of the Acoustical Society of America\",\"volume\":\"158 1\",\"pages\":\"123-134\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Acoustical Society of America\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0037080\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0037080","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
Cosine similarity-based few-shot bioacoustic event detection with automatic frequency range identification in Mel-spectrograms.
Few-shot sound event detection (SED) has been an appealing idea in bioacoustics due to its potentials in reducing the labor of labeling the recordings to just a few positive examples. Since sounds can be represented as images in the time-frequency plane, existing few-shot-SED methods often borrow techniques from object detection in image processing, such as the prototypical networks. When applied to bioacoustic SED, however, prototypical networks encounter significant challenges. For instance, the main acoustic targets in a spectrogram are often small. The background is typically noisy, leading to overlap in frequency between positive and negative events. Furthermore, the positive events may be weak compared to the background. To overcome these difficulties, this study introduces an automatic frequency range identification algorithm, which is designed to handle small objects in Mel-spectrograms effectively. After the desired frequency range is identified, the system performs event detection by computing cosine similarity in a gliding-window manner. Overall, the system does not require a large amount of training data, or rely on pretrained models. An F-score of 46.9% was achieved on DCASE 2024 Task 5, placing it on top 3 and demonstrating its potential in addressing the unique challenges of bioacoustic SED.
期刊介绍:
Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.