{"title":"Segment Anything for Visual Bird Sound Denoising","authors":"Chenxi Zhou;Tianjiao Wan;Kele Xu;Peng Qiao;Yong Dou","doi":"10.1109/LSP.2025.3545005","DOIUrl":null,"url":null,"abstract":"Current audio denoising methods perform well with synthetic noise but struggle with complex natural noise, especially for bird sounds, which contain natural environmental sounds such as wind and rain, making it challenging to extract clean bird sounds. This issue becomes more pronounced with short and faint bird sounds, where existing methods are less effective. In this paper, we introduce <bold>BudSAM</b>, a novel audio denoising model that incorporates the <bold>Segment Anything Model (SAM)</b>, originally designed for image segmentation task, into the field of visual bird sound denoising. By treating audio denoising as a segmentation task, BudSAM utilizes SAM's powerful segmentation capabilities and we incorporates BCE and Dice losses to enhance the model's ability to segment weak signals, effectively isolating the clean bird sounds that are often masked by background noise. Our method is evaluated on the BirdSoundsDenoising dataset, achieving a 4.0% improvement in IoU and a 0.77 dB increase in SDR compared to state-of-the-art methods. To the best knowledge of the authors, BudSAM marks the first attempt which employs SAM in audio denoising task, offering a promising direction for future research and real-world bird sound processing tasks.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1076-1080"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10902138/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Current audio denoising methods perform well with synthetic noise but struggle with complex natural noise, especially for bird sounds, which contain natural environmental sounds such as wind and rain, making it challenging to extract clean bird sounds. This issue becomes more pronounced with short and faint bird sounds, where existing methods are less effective. In this paper, we introduce BudSAM, a novel audio denoising model that incorporates the Segment Anything Model (SAM), originally designed for image segmentation task, into the field of visual bird sound denoising. By treating audio denoising as a segmentation task, BudSAM utilizes SAM's powerful segmentation capabilities and we incorporates BCE and Dice losses to enhance the model's ability to segment weak signals, effectively isolating the clean bird sounds that are often masked by background noise. Our method is evaluated on the BirdSoundsDenoising dataset, achieving a 4.0% improvement in IoU and a 0.77 dB increase in SDR compared to state-of-the-art methods. To the best knowledge of the authors, BudSAM marks the first attempt which employs SAM in audio denoising task, offering a promising direction for future research and real-world bird sound processing tasks.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.