Lei Deng, Shaojuan Luo, Chunhua He, Huapan Xiao, Heng Wu
{"title":"Underwater small and occlusion object detection with feature fusion and global context decoupling head-based YOLO","authors":"Lei Deng, Shaojuan Luo, Chunhua He, Huapan Xiao, Heng Wu","doi":"10.1007/s00530-024-01410-z","DOIUrl":null,"url":null,"abstract":"<p>The underwater light scattering, absorption, and camera or target moving often bring issues such as blurring, distortion, and color deviation in underwater imaging, which poses significant challenges to underwater target detection. Numerous detectors have been proposed to address these challenges, such as YOLO series models, RCNN-based variants, and Transformer-based variants. However, the previous detectors often have poor detection results when encountering small targets and target occlusion problems. To tackle these issues, We propose a feature fusion and global semantic decoupling head-based YOLO detection method. Specifically, we propose an efficient feature fusion module to solve the problem of small target feature information being lost and difficult to detect accurately. We also use self-supervision to recalibrate the feature information between each level, which achieves full integration of semantic information between different levels. We design a decoupling head that focuses on global context information, which can better filter out complex background information, thereby achieving effective detection of targets under occluded backgrounds. Finally, we replace simple upsampling with a content-aware reassembly module in the YOLO backbone, alleviating the problem of imprecise localization and identification of small targets caused by feature loss to some extent. The experimental results indicate that the proposed method achieves superior performance compared to other state-of-the-art single-stage and two-stage detection networks. Specifically, on the UTDAC2020 dataset, the proposed method attains mAP50-95 and mAP50 scores of 54.4% and 87.7%, respectively.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"1 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01410-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The underwater light scattering, absorption, and camera or target moving often bring issues such as blurring, distortion, and color deviation in underwater imaging, which poses significant challenges to underwater target detection. Numerous detectors have been proposed to address these challenges, such as YOLO series models, RCNN-based variants, and Transformer-based variants. However, the previous detectors often have poor detection results when encountering small targets and target occlusion problems. To tackle these issues, We propose a feature fusion and global semantic decoupling head-based YOLO detection method. Specifically, we propose an efficient feature fusion module to solve the problem of small target feature information being lost and difficult to detect accurately. We also use self-supervision to recalibrate the feature information between each level, which achieves full integration of semantic information between different levels. We design a decoupling head that focuses on global context information, which can better filter out complex background information, thereby achieving effective detection of targets under occluded backgrounds. Finally, we replace simple upsampling with a content-aware reassembly module in the YOLO backbone, alleviating the problem of imprecise localization and identification of small targets caused by feature loss to some extent. The experimental results indicate that the proposed method achieves superior performance compared to other state-of-the-art single-stage and two-stage detection networks. Specifically, on the UTDAC2020 dataset, the proposed method attains mAP50-95 and mAP50 scores of 54.4% and 87.7%, respectively.
期刊介绍:
This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.