DFEFM: Fusing frequency correlation and mel features for robust edge bird audio detection

IF 1.7 2区生物学 Q1 ORNITHOLOGY

Avian Research Pub Date : 2025-02-25 DOI:10.1016/j.avrs.2025.100232

Yingqi Wang , Luyang Zhang , Jiangjian Xie , Junguo Zhang , Rui Zhu

{"title":"DFEFM: Fusing frequency correlation and mel features for robust edge bird audio detection","authors":"Yingqi Wang , Luyang Zhang , Jiangjian Xie , Junguo Zhang , Rui Zhu","doi":"10.1016/j.avrs.2025.100232","DOIUrl":null,"url":null,"abstract":"<div><div>Passive acoustic monitoring (PAM) technology is increasingly becoming one of the mainstream methods for bird monitoring. However, detecting bird audio within complex natural acoustic environments using PAM devices remains a significant challenge. To enhance the accuracy (ACC) of bird audio detection (BAD) and reduce both false negatives and false positives, this study proposes a BAD method based on a Dual-Feature Enhancement Fusion Model (DFEFM). This method incorporates per-channel energy normalization (PCEN) to suppress noise in the input audio and utilizes mel-frequency cepstral coefficients (MFCC) and frequency correlation matrices (FCM) as input features. It achieves deep feature-level fusion of MFCC and FCM on the channel dimension through two independent multi-layer convolutional network branches, and further integrates Spatial and Channel Synergistic Attention (SCSA) and Multi-Head Attention (MHA) modules to enhance the fusion effect of the aforementioned two deep features. Experimental results on the DCASE2018 BAD dataset show that our proposed method achieved an ACC of 91.4% and an AUC value of 0.963, with false negative and false positive rates of 11.36% and 7.40%, respectively, surpassing existing methods. The method also demonstrated detection ACC above 92% and AUC values above 0.987 on datasets from three sites of different natural scenes in Beijing. Testing on the NVIDIA Jetson Nano indicated that the method achieved an ACC of 89.48% when processing an average of 10 s of audio, with a response time of only 0.557 s, showing excellent processing efficiency. This study provides an effective method for filtering non-bird vocalization audio in bird vocalization monitoring devices, which helps to save edge storage and information transmission costs, and has significant application value for wild bird monitoring and ecological research.</div></div>","PeriodicalId":51311,"journal":{"name":"Avian Research","volume":"16 2","pages":"Article 100232"},"PeriodicalIF":1.7000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Avian Research","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2053716625000118","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORNITHOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Passive acoustic monitoring (PAM) technology is increasingly becoming one of the mainstream methods for bird monitoring. However, detecting bird audio within complex natural acoustic environments using PAM devices remains a significant challenge. To enhance the accuracy (ACC) of bird audio detection (BAD) and reduce both false negatives and false positives, this study proposes a BAD method based on a Dual-Feature Enhancement Fusion Model (DFEFM). This method incorporates per-channel energy normalization (PCEN) to suppress noise in the input audio and utilizes mel-frequency cepstral coefficients (MFCC) and frequency correlation matrices (FCM) as input features. It achieves deep feature-level fusion of MFCC and FCM on the channel dimension through two independent multi-layer convolutional network branches, and further integrates Spatial and Channel Synergistic Attention (SCSA) and Multi-Head Attention (MHA) modules to enhance the fusion effect of the aforementioned two deep features. Experimental results on the DCASE2018 BAD dataset show that our proposed method achieved an ACC of 91.4% and an AUC value of 0.963, with false negative and false positive rates of 11.36% and 7.40%, respectively, surpassing existing methods. The method also demonstrated detection ACC above 92% and AUC values above 0.987 on datasets from three sites of different natural scenes in Beijing. Testing on the NVIDIA Jetson Nano indicated that the method achieved an ACC of 89.48% when processing an average of 10 s of audio, with a response time of only 0.557 s, showing excellent processing efficiency. This study provides an effective method for filtering non-bird vocalization audio in bird vocalization monitoring devices, which helps to save edge storage and information transmission costs, and has significant application value for wild bird monitoring and ecological research.

查看原文本刊更多论文

融合频率相关和mel特征的鲁棒边缘鸟音频检测

被动声监测（PAM）技术正日益成为鸟类监测的主流方法之一。然而，使用PAM设备在复杂的自然声环境中检测鸟类的声音仍然是一个重大挑战。为了提高鸟类音频检测的准确率（ACC），减少误报和误报，提出了一种基于双特征增强融合模型（DFEFM）的鸟类音频检测方法。该方法采用单通道能量归一化（PCEN）来抑制输入音频中的噪声，并利用mel-frequency倒谱系数（MFCC）和频率相关矩阵（FCM）作为输入特征。通过两个独立的多层卷积网络分支，实现了MFCC和FCM在信道维度上的深度特征级融合，并进一步集成空间与信道协同注意（SCSA）和多头注意（MHA）模块，增强了上述两种深度特征的融合效果。在DCASE2018 BAD数据集上的实验结果表明，本文方法的ACC值为91.4%，AUC值为0.963，假阴性和假阳性率分别为11.36%和7.40%，优于现有方法。该方法对北京地区3个不同自然景观站点的检测ACC均在92%以上，AUC值均在0.987以上。在NVIDIA Jetson Nano上的测试表明，该方法在处理平均10 s的音频时，ACC达到89.48%，响应时间仅为0.557 s，显示出优异的处理效率。本研究为鸟类发声监测装置中非鸟类发声音频的过滤提供了一种有效的方法，有助于节省边缘存储和信息传输成本，对野生鸟类监测和生态研究具有重要的应用价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Avian Research ORNITHOLOGY-

CiteScore

2.90

自引率

16.70%

发文量

456

审稿时长

46 days

期刊介绍： Avian Research is an open access, peer-reviewed journal publishing high quality research and review articles on all aspects of ornithology from all over the world. It aims to report the latest and most significant progress in ornithology and to encourage exchange of ideas among international ornithologists. As an open access journal, Avian Research provides a unique opportunity to publish high quality contents that will be internationally accessible to any reader at no cost.