融合频率相关和mel特征的鲁棒边缘鸟音频检测

IF 1.6 2区 生物学 Q1 ORNITHOLOGY
Yingqi Wang , Luyang Zhang , Jiangjian Xie , Junguo Zhang , Rui Zhu
{"title":"融合频率相关和mel特征的鲁棒边缘鸟音频检测","authors":"Yingqi Wang ,&nbsp;Luyang Zhang ,&nbsp;Jiangjian Xie ,&nbsp;Junguo Zhang ,&nbsp;Rui Zhu","doi":"10.1016/j.avrs.2025.100232","DOIUrl":null,"url":null,"abstract":"<div><div>Passive acoustic monitoring (PAM) technology is increasingly becoming one of the mainstream methods for bird monitoring. However, detecting bird audio within complex natural acoustic environments using PAM devices remains a significant challenge. To enhance the accuracy (ACC) of bird audio detection (BAD) and reduce both false negatives and false positives, this study proposes a BAD method based on a Dual-Feature Enhancement Fusion Model (DFEFM). This method incorporates per-channel energy normalization (PCEN) to suppress noise in the input audio and utilizes mel-frequency cepstral coefficients (MFCC) and frequency correlation matrices (FCM) as input features. It achieves deep feature-level fusion of MFCC and FCM on the channel dimension through two independent multi-layer convolutional network branches, and further integrates Spatial and Channel Synergistic Attention (SCSA) and Multi-Head Attention (MHA) modules to enhance the fusion effect of the aforementioned two deep features. Experimental results on the DCASE2018 BAD dataset show that our proposed method achieved an ACC of 91.4% and an AUC value of 0.963, with false negative and false positive rates of 11.36% and 7.40%, respectively, surpassing existing methods. The method also demonstrated detection ACC above 92% and AUC values above 0.987 on datasets from three sites of different natural scenes in Beijing. Testing on the NVIDIA Jetson Nano indicated that the method achieved an ACC of 89.48% when processing an average of 10 s of audio, with a response time of only 0.557 s, showing excellent processing efficiency. This study provides an effective method for filtering non-bird vocalization audio in bird vocalization monitoring devices, which helps to save edge storage and information transmission costs, and has significant application value for wild bird monitoring and ecological research.</div></div>","PeriodicalId":51311,"journal":{"name":"Avian Research","volume":"16 2","pages":"Article 100232"},"PeriodicalIF":1.6000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DFEFM: Fusing frequency correlation and mel features for robust edge bird audio detection\",\"authors\":\"Yingqi Wang ,&nbsp;Luyang Zhang ,&nbsp;Jiangjian Xie ,&nbsp;Junguo Zhang ,&nbsp;Rui Zhu\",\"doi\":\"10.1016/j.avrs.2025.100232\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Passive acoustic monitoring (PAM) technology is increasingly becoming one of the mainstream methods for bird monitoring. However, detecting bird audio within complex natural acoustic environments using PAM devices remains a significant challenge. To enhance the accuracy (ACC) of bird audio detection (BAD) and reduce both false negatives and false positives, this study proposes a BAD method based on a Dual-Feature Enhancement Fusion Model (DFEFM). This method incorporates per-channel energy normalization (PCEN) to suppress noise in the input audio and utilizes mel-frequency cepstral coefficients (MFCC) and frequency correlation matrices (FCM) as input features. It achieves deep feature-level fusion of MFCC and FCM on the channel dimension through two independent multi-layer convolutional network branches, and further integrates Spatial and Channel Synergistic Attention (SCSA) and Multi-Head Attention (MHA) modules to enhance the fusion effect of the aforementioned two deep features. Experimental results on the DCASE2018 BAD dataset show that our proposed method achieved an ACC of 91.4% and an AUC value of 0.963, with false negative and false positive rates of 11.36% and 7.40%, respectively, surpassing existing methods. The method also demonstrated detection ACC above 92% and AUC values above 0.987 on datasets from three sites of different natural scenes in Beijing. Testing on the NVIDIA Jetson Nano indicated that the method achieved an ACC of 89.48% when processing an average of 10 s of audio, with a response time of only 0.557 s, showing excellent processing efficiency. This study provides an effective method for filtering non-bird vocalization audio in bird vocalization monitoring devices, which helps to save edge storage and information transmission costs, and has significant application value for wild bird monitoring and ecological research.</div></div>\",\"PeriodicalId\":51311,\"journal\":{\"name\":\"Avian Research\",\"volume\":\"16 2\",\"pages\":\"Article 100232\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Avian Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2053716625000118\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ORNITHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Avian Research","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2053716625000118","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORNITHOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

被动声监测(PAM)技术正日益成为鸟类监测的主流方法之一。然而,使用PAM设备在复杂的自然声环境中检测鸟类的声音仍然是一个重大挑战。为了提高鸟类音频检测的准确率(ACC),减少误报和误报,提出了一种基于双特征增强融合模型(DFEFM)的鸟类音频检测方法。该方法采用单通道能量归一化(PCEN)来抑制输入音频中的噪声,并利用mel-frequency倒谱系数(MFCC)和频率相关矩阵(FCM)作为输入特征。通过两个独立的多层卷积网络分支,实现了MFCC和FCM在信道维度上的深度特征级融合,并进一步集成空间与信道协同注意(SCSA)和多头注意(MHA)模块,增强了上述两种深度特征的融合效果。在DCASE2018 BAD数据集上的实验结果表明,本文方法的ACC值为91.4%,AUC值为0.963,假阴性和假阳性率分别为11.36%和7.40%,优于现有方法。该方法对北京地区3个不同自然景观站点的检测ACC均在92%以上,AUC值均在0.987以上。在NVIDIA Jetson Nano上的测试表明,该方法在处理平均10 s的音频时,ACC达到89.48%,响应时间仅为0.557 s,显示出优异的处理效率。本研究为鸟类发声监测装置中非鸟类发声音频的过滤提供了一种有效的方法,有助于节省边缘存储和信息传输成本,对野生鸟类监测和生态研究具有重要的应用价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DFEFM: Fusing frequency correlation and mel features for robust edge bird audio detection
Passive acoustic monitoring (PAM) technology is increasingly becoming one of the mainstream methods for bird monitoring. However, detecting bird audio within complex natural acoustic environments using PAM devices remains a significant challenge. To enhance the accuracy (ACC) of bird audio detection (BAD) and reduce both false negatives and false positives, this study proposes a BAD method based on a Dual-Feature Enhancement Fusion Model (DFEFM). This method incorporates per-channel energy normalization (PCEN) to suppress noise in the input audio and utilizes mel-frequency cepstral coefficients (MFCC) and frequency correlation matrices (FCM) as input features. It achieves deep feature-level fusion of MFCC and FCM on the channel dimension through two independent multi-layer convolutional network branches, and further integrates Spatial and Channel Synergistic Attention (SCSA) and Multi-Head Attention (MHA) modules to enhance the fusion effect of the aforementioned two deep features. Experimental results on the DCASE2018 BAD dataset show that our proposed method achieved an ACC of 91.4% and an AUC value of 0.963, with false negative and false positive rates of 11.36% and 7.40%, respectively, surpassing existing methods. The method also demonstrated detection ACC above 92% and AUC values above 0.987 on datasets from three sites of different natural scenes in Beijing. Testing on the NVIDIA Jetson Nano indicated that the method achieved an ACC of 89.48% when processing an average of 10 s of audio, with a response time of only 0.557 s, showing excellent processing efficiency. This study provides an effective method for filtering non-bird vocalization audio in bird vocalization monitoring devices, which helps to save edge storage and information transmission costs, and has significant application value for wild bird monitoring and ecological research.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Avian Research
Avian Research ORNITHOLOGY-
CiteScore
2.90
自引率
16.70%
发文量
456
审稿时长
46 days
期刊介绍: Avian Research is an open access, peer-reviewed journal publishing high quality research and review articles on all aspects of ornithology from all over the world. It aims to report the latest and most significant progress in ornithology and to encourage exchange of ideas among international ornithologists. As an open access journal, Avian Research provides a unique opportunity to publish high quality contents that will be internationally accessible to any reader at no cost.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信