DFEFM: Fusing frequency correlation and mel features for robust edge bird audio detection

IF 1.6 2区 生物学 Q1 ORNITHOLOGY
Yingqi Wang , Luyang Zhang , Jiangjian Xie , Junguo Zhang , Rui Zhu
{"title":"DFEFM: Fusing frequency correlation and mel features for robust edge bird audio detection","authors":"Yingqi Wang ,&nbsp;Luyang Zhang ,&nbsp;Jiangjian Xie ,&nbsp;Junguo Zhang ,&nbsp;Rui Zhu","doi":"10.1016/j.avrs.2025.100232","DOIUrl":null,"url":null,"abstract":"<div><div>Passive acoustic monitoring (PAM) technology is increasingly becoming one of the mainstream methods for bird monitoring. However, detecting bird audio within complex natural acoustic environments using PAM devices remains a significant challenge. To enhance the accuracy (ACC) of bird audio detection (BAD) and reduce both false negatives and false positives, this study proposes a BAD method based on a Dual-Feature Enhancement Fusion Model (DFEFM). This method incorporates per-channel energy normalization (PCEN) to suppress noise in the input audio and utilizes mel-frequency cepstral coefficients (MFCC) and frequency correlation matrices (FCM) as input features. It achieves deep feature-level fusion of MFCC and FCM on the channel dimension through two independent multi-layer convolutional network branches, and further integrates Spatial and Channel Synergistic Attention (SCSA) and Multi-Head Attention (MHA) modules to enhance the fusion effect of the aforementioned two deep features. Experimental results on the DCASE2018 BAD dataset show that our proposed method achieved an ACC of 91.4% and an AUC value of 0.963, with false negative and false positive rates of 11.36% and 7.40%, respectively, surpassing existing methods. The method also demonstrated detection ACC above 92% and AUC values above 0.987 on datasets from three sites of different natural scenes in Beijing. Testing on the NVIDIA Jetson Nano indicated that the method achieved an ACC of 89.48% when processing an average of 10 s of audio, with a response time of only 0.557 s, showing excellent processing efficiency. This study provides an effective method for filtering non-bird vocalization audio in bird vocalization monitoring devices, which helps to save edge storage and information transmission costs, and has significant application value for wild bird monitoring and ecological research.</div></div>","PeriodicalId":51311,"journal":{"name":"Avian Research","volume":"16 2","pages":"Article 100232"},"PeriodicalIF":1.6000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Avian Research","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2053716625000118","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORNITHOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Passive acoustic monitoring (PAM) technology is increasingly becoming one of the mainstream methods for bird monitoring. However, detecting bird audio within complex natural acoustic environments using PAM devices remains a significant challenge. To enhance the accuracy (ACC) of bird audio detection (BAD) and reduce both false negatives and false positives, this study proposes a BAD method based on a Dual-Feature Enhancement Fusion Model (DFEFM). This method incorporates per-channel energy normalization (PCEN) to suppress noise in the input audio and utilizes mel-frequency cepstral coefficients (MFCC) and frequency correlation matrices (FCM) as input features. It achieves deep feature-level fusion of MFCC and FCM on the channel dimension through two independent multi-layer convolutional network branches, and further integrates Spatial and Channel Synergistic Attention (SCSA) and Multi-Head Attention (MHA) modules to enhance the fusion effect of the aforementioned two deep features. Experimental results on the DCASE2018 BAD dataset show that our proposed method achieved an ACC of 91.4% and an AUC value of 0.963, with false negative and false positive rates of 11.36% and 7.40%, respectively, surpassing existing methods. The method also demonstrated detection ACC above 92% and AUC values above 0.987 on datasets from three sites of different natural scenes in Beijing. Testing on the NVIDIA Jetson Nano indicated that the method achieved an ACC of 89.48% when processing an average of 10 s of audio, with a response time of only 0.557 s, showing excellent processing efficiency. This study provides an effective method for filtering non-bird vocalization audio in bird vocalization monitoring devices, which helps to save edge storage and information transmission costs, and has significant application value for wild bird monitoring and ecological research.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Avian Research
Avian Research ORNITHOLOGY-
CiteScore
2.90
自引率
16.70%
发文量
456
审稿时长
46 days
期刊介绍: Avian Research is an open access, peer-reviewed journal publishing high quality research and review articles on all aspects of ornithology from all over the world. It aims to report the latest and most significant progress in ornithology and to encourage exchange of ideas among international ornithologists. As an open access journal, Avian Research provides a unique opportunity to publish high quality contents that will be internationally accessible to any reader at no cost.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信