Weakly supervised bird-flock counting in wetlands based on multimodal optical image perception

Shuxiang Feng , Mengxue Lyu , Xuetao Han , Chang Liu , Jun Qiu
{"title":"Weakly supervised bird-flock counting in wetlands based on multimodal optical image perception","authors":"Shuxiang Feng ,&nbsp;Mengxue Lyu ,&nbsp;Xuetao Han ,&nbsp;Chang Liu ,&nbsp;Jun Qiu","doi":"10.1016/j.wsee.2025.05.006","DOIUrl":null,"url":null,"abstract":"<div><div>As crucial bio-indicators for wetland ecosystem health assessment, wetland birds play a pivotal role in ecological monitoring and conservation. This study address three challenges in avian population monitoring using optical remote sensing imagery, including high cost of manual annotation, difficulty in extracting small target features in complex background, and insufficient adaptability of multi-scale target recognition. We propose a weakly supervised bird-flock counting method based on the optical image multimodal perception model integrating optical image features and visual semantic features without location annotation. Based on optical image feature enhancement, visual semantic features related to the counting task are extracted through visual cues (counting text prompt), and a learnable feature adapter is introduced to fuse optical image features with visual semantic features. Thus, an optical image multimodal perception model with residual connection mechanism and multi-scale information interaction module is constructed. The residual connection mechanism effectively alleviates the interference caused by posture changes and complex background, and the multi-scale information interaction module solves the problem of target scale change through cross-scale semantic propagation. We construct an optical images bird-flock dataset named Wetland-Bird-Count for the Yellow River Delta coastal wetlands. The experimental results show that the MAE and MSE of the proposed method are 45.2 and 54.2, which is much more accurate than other weakly supervised and unsupervised methods and close to the fully supervised counting method, which verifies that the weakly supervised cluster counting using optical image visual cues can improve the accuracy of bird flock counting under lightweight annotation. This study provides a reliable quantitative analysis tool for optical image ecological monitoring.</div></div>","PeriodicalId":101280,"journal":{"name":"Watershed Ecology and the Environment","volume":"7 ","pages":"Pages 249-257"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Watershed Ecology and the Environment","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S258947142500021X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As crucial bio-indicators for wetland ecosystem health assessment, wetland birds play a pivotal role in ecological monitoring and conservation. This study address three challenges in avian population monitoring using optical remote sensing imagery, including high cost of manual annotation, difficulty in extracting small target features in complex background, and insufficient adaptability of multi-scale target recognition. We propose a weakly supervised bird-flock counting method based on the optical image multimodal perception model integrating optical image features and visual semantic features without location annotation. Based on optical image feature enhancement, visual semantic features related to the counting task are extracted through visual cues (counting text prompt), and a learnable feature adapter is introduced to fuse optical image features with visual semantic features. Thus, an optical image multimodal perception model with residual connection mechanism and multi-scale information interaction module is constructed. The residual connection mechanism effectively alleviates the interference caused by posture changes and complex background, and the multi-scale information interaction module solves the problem of target scale change through cross-scale semantic propagation. We construct an optical images bird-flock dataset named Wetland-Bird-Count for the Yellow River Delta coastal wetlands. The experimental results show that the MAE and MSE of the proposed method are 45.2 and 54.2, which is much more accurate than other weakly supervised and unsupervised methods and close to the fully supervised counting method, which verifies that the weakly supervised cluster counting using optical image visual cues can improve the accuracy of bird flock counting under lightweight annotation. This study provides a reliable quantitative analysis tool for optical image ecological monitoring.
基于多模态光学图像感知的湿地弱监督鸟群计数
湿地鸟类作为湿地生态系统健康评价的重要生物指标,在湿地生态监测和保护中发挥着举足轻重的作用。针对光学遥感鸟类种群监测中存在的人工标注成本高、复杂背景下小目标特征提取困难、多尺度目标识别适应性不足等问题,进行了研究。提出了一种基于光学图像多模态感知模型的弱监督鸟群计数方法,该模型集成了光学图像特征和视觉语义特征,无需位置标注。在光学图像特征增强的基础上,通过视觉线索(计数文本提示)提取与计数任务相关的视觉语义特征,并引入可学习特征适配器将光学图像特征与视觉语义特征融合。因此,构建了具有残差连接机制和多尺度信息交互模块的光学图像多模态感知模型。残差连接机制有效缓解姿态变化和复杂背景带来的干扰,多尺度信息交互模块通过跨尺度语义传播解决目标尺度变化问题。以黄河三角洲滨海湿地为研究对象,构建了一个光学影像鸟群数据集——湿地鸟数(wetlands - bird- count)。实验结果表明,该方法的MAE和MSE分别为45.2和54.2,比其他弱监督和无监督计数方法的准确率要高得多,接近于完全监督计数方法,验证了基于光学图像视觉线索的弱监督聚类计数可以提高轻量化注释下鸟群计数的准确率。本研究为光学影像生态监测提供了可靠的定量分析工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.00
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信