Shuxiang Feng , Mengxue Lyu , Xuetao Han , Chang Liu , Jun Qiu
{"title":"Weakly supervised bird-flock counting in wetlands based on multimodal optical image perception","authors":"Shuxiang Feng , Mengxue Lyu , Xuetao Han , Chang Liu , Jun Qiu","doi":"10.1016/j.wsee.2025.05.006","DOIUrl":null,"url":null,"abstract":"<div><div>As crucial bio-indicators for wetland ecosystem health assessment, wetland birds play a pivotal role in ecological monitoring and conservation. This study address three challenges in avian population monitoring using optical remote sensing imagery, including high cost of manual annotation, difficulty in extracting small target features in complex background, and insufficient adaptability of multi-scale target recognition. We propose a weakly supervised bird-flock counting method based on the optical image multimodal perception model integrating optical image features and visual semantic features without location annotation. Based on optical image feature enhancement, visual semantic features related to the counting task are extracted through visual cues (counting text prompt), and a learnable feature adapter is introduced to fuse optical image features with visual semantic features. Thus, an optical image multimodal perception model with residual connection mechanism and multi-scale information interaction module is constructed. The residual connection mechanism effectively alleviates the interference caused by posture changes and complex background, and the multi-scale information interaction module solves the problem of target scale change through cross-scale semantic propagation. We construct an optical images bird-flock dataset named Wetland-Bird-Count for the Yellow River Delta coastal wetlands. The experimental results show that the MAE and MSE of the proposed method are 45.2 and 54.2, which is much more accurate than other weakly supervised and unsupervised methods and close to the fully supervised counting method, which verifies that the weakly supervised cluster counting using optical image visual cues can improve the accuracy of bird flock counting under lightweight annotation. This study provides a reliable quantitative analysis tool for optical image ecological monitoring.</div></div>","PeriodicalId":101280,"journal":{"name":"Watershed Ecology and the Environment","volume":"7 ","pages":"Pages 249-257"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Watershed Ecology and the Environment","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S258947142500021X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As crucial bio-indicators for wetland ecosystem health assessment, wetland birds play a pivotal role in ecological monitoring and conservation. This study address three challenges in avian population monitoring using optical remote sensing imagery, including high cost of manual annotation, difficulty in extracting small target features in complex background, and insufficient adaptability of multi-scale target recognition. We propose a weakly supervised bird-flock counting method based on the optical image multimodal perception model integrating optical image features and visual semantic features without location annotation. Based on optical image feature enhancement, visual semantic features related to the counting task are extracted through visual cues (counting text prompt), and a learnable feature adapter is introduced to fuse optical image features with visual semantic features. Thus, an optical image multimodal perception model with residual connection mechanism and multi-scale information interaction module is constructed. The residual connection mechanism effectively alleviates the interference caused by posture changes and complex background, and the multi-scale information interaction module solves the problem of target scale change through cross-scale semantic propagation. We construct an optical images bird-flock dataset named Wetland-Bird-Count for the Yellow River Delta coastal wetlands. The experimental results show that the MAE and MSE of the proposed method are 45.2 and 54.2, which is much more accurate than other weakly supervised and unsupervised methods and close to the fully supervised counting method, which verifies that the weakly supervised cluster counting using optical image visual cues can improve the accuracy of bird flock counting under lightweight annotation. This study provides a reliable quantitative analysis tool for optical image ecological monitoring.