YOLOv11-OPNet: dual-modality framework for fishing vessel detection in complex long-range maritime scenarios

IF 2.4 3区农林科学 Q2 FISHERIES

Aquaculture International Pub Date : 2025-09-03 DOI:10.1007/s10499-025-02216-0

Jian Li, Hong Yu, Rui Shi, Zhibo Cui, Hongjing Dai, Zijian Wu, Yue Wang

{"title":"YOLOv11-OPNet: dual-modality framework for fishing vessel detection in complex long-range maritime scenarios","authors":"Jian Li, Hong Yu, Rui Shi, Zhibo Cui, Hongjing Dai, Zijian Wu, Yue Wang","doi":"10.1007/s10499-025-02216-0","DOIUrl":null,"url":null,"abstract":"<div><p>Fishing vessel detection is a crucial technology for fishing vessel monitoring. However, in real aquaculture monitoring, the distant camera viewpoint causes fishing vessels and coastal watchtowers to share similar visual features, while the vessels’ small size and dense spatial distribution further complicate recognition. To solve the aforementioned issues, we propose the YOLOv11-OPNet (YOLOv11-Optical FlowNet) framework for fishing vessel detection. Firstly, with the challenge of low detection accuracy caused by insufficient image information of distant targets, which often leads to confusion with similar objects such as coastal watchtowers and reefs, this study introduces an optical flow modality into the backbone network of YOLOv11. By capturing the dynamic characteristics of fishing vessels through optical flow analysis, the model achieves improved detection accuracy. Furthermore, a cross-modal fusion module—termed LOF (light modality and Optical flow modality fusion module)—is proposed, integrating channel attention with a spatial attention mechanism. This module adaptively filters input features, facilitating effective fusion between optical flow and visible light modalities. Additionally, the PAN-FAN neck architecture of YOLOv11 is enhanced by incorporating an additional P2 layer with higher resolution and embedding the EMA (efficient multi-scale attention) mechanism. These improvements enable more robust multi-scale feature extraction and significantly enhance the semantic representation of small targets. Finally, to optimize the localization performance, Focaler-IoU is employed as the loss function, together with geometric parameter optimization of the bounding boxes, leading to improved detection recall. To evaluate the effectiveness of the proposed approach, comprehensive comparative tests and ablation studies were conducted on a self-constructed dataset of real-world farmed fishing vessels. Experimental results demonstrate that YOLOv11-OPNet achieves mean average accuracy (mAcc), recall (mRec), and precision (mAP) of 87.1%, 82.3%, and 84.0%, respectively. Compared to the baseline YOLOv11 model, these represent improvements of 7.5%, 15%, and 11.6%, respectively. The results confirm that YOLOv11-OPNet is well-suited for detecting fishing vessels in complex real-world aquatic environments characterized by poor image quality.</p></div>","PeriodicalId":8122,"journal":{"name":"Aquaculture International","volume":"33 6","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aquaculture International","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s10499-025-02216-0","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FISHERIES","Score":null,"Total":0}

引用次数: 0

Abstract

Fishing vessel detection is a crucial technology for fishing vessel monitoring. However, in real aquaculture monitoring, the distant camera viewpoint causes fishing vessels and coastal watchtowers to share similar visual features, while the vessels’ small size and dense spatial distribution further complicate recognition. To solve the aforementioned issues, we propose the YOLOv11-OPNet (YOLOv11-Optical FlowNet) framework for fishing vessel detection. Firstly, with the challenge of low detection accuracy caused by insufficient image information of distant targets, which often leads to confusion with similar objects such as coastal watchtowers and reefs, this study introduces an optical flow modality into the backbone network of YOLOv11. By capturing the dynamic characteristics of fishing vessels through optical flow analysis, the model achieves improved detection accuracy. Furthermore, a cross-modal fusion module—termed LOF (light modality and Optical flow modality fusion module)—is proposed, integrating channel attention with a spatial attention mechanism. This module adaptively filters input features, facilitating effective fusion between optical flow and visible light modalities. Additionally, the PAN-FAN neck architecture of YOLOv11 is enhanced by incorporating an additional P2 layer with higher resolution and embedding the EMA (efficient multi-scale attention) mechanism. These improvements enable more robust multi-scale feature extraction and significantly enhance the semantic representation of small targets. Finally, to optimize the localization performance, Focaler-IoU is employed as the loss function, together with geometric parameter optimization of the bounding boxes, leading to improved detection recall. To evaluate the effectiveness of the proposed approach, comprehensive comparative tests and ablation studies were conducted on a self-constructed dataset of real-world farmed fishing vessels. Experimental results demonstrate that YOLOv11-OPNet achieves mean average accuracy (mAcc), recall (mRec), and precision (mAP) of 87.1%, 82.3%, and 84.0%, respectively. Compared to the baseline YOLOv11 model, these represent improvements of 7.5%, 15%, and 11.6%, respectively. The results confirm that YOLOv11-OPNet is well-suited for detecting fishing vessels in complex real-world aquatic environments characterized by poor image quality.

查看原文本刊更多论文

YOLOv11-OPNet：用于复杂远程海上场景中渔船探测的双模态框架

渔船检测是渔船监测的一项关键技术。然而，在实际的水产养殖监测中，由于摄像机视点较远，导致渔船与海岸瞭望塔的视觉特征相似，而渔船体积小、空间分布密集，使得识别更加复杂。为了解决上述问题，我们提出了YOLOv11-OPNet （YOLOv11-Optical FlowNet）渔船检测框架。首先，针对远距离目标图像信息不足导致检测精度低，容易与海岸瞭望塔、暗礁等相似目标混淆的挑战，本研究在YOLOv11骨干网中引入了光流模式。该模型通过光流分析捕获渔船的动态特性，提高了检测精度。此外，提出了一种跨模态融合模块——光模态和光流模态融合模块（LOF），将通道注意与空间注意机制相结合。该模块自适应过滤输入特征，促进光流和可见光模态之间的有效融合。此外，YOLOv11的PAN-FAN颈部架构通过加入更高分辨率的额外P2层和嵌入EMA（高效多尺度注意）机制而得到增强。这些改进使多尺度特征提取更加鲁棒，并显著增强了小目标的语义表示。最后，为了优化定位性能，采用Focaler-IoU作为损失函数，结合边界盒的几何参数优化，提高检测召回率。为了评估该方法的有效性，在一个自建的真实养殖渔船数据集上进行了全面的对比试验和消融研究。实验结果表明，YOLOv11-OPNet的平均准确率（mAcc）、查全率（mRec）和查准率（mAP）分别达到87.1%、82.3%和84.0%。与基线YOLOv11模型相比，这些模型分别提高了7.5%、15%和11.6%。结果证实，YOLOv11-OPNet非常适合在图像质量差的复杂现实水环境中检测渔船。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Aquaculture International 农林科学-渔业

CiteScore

5.10

自引率

6.90%

发文量

204

审稿时长

1.0 months

期刊介绍： Aquaculture International is an international journal publishing original research papers, short communications, technical notes and review papers on all aspects of aquaculture. The Journal covers topics such as the biology, physiology, pathology and genetics of cultured fish, crustaceans, molluscs and plants, especially new species; water quality of supply systems, fluctuations in water quality within farms and the environmental impacts of aquacultural operations; nutrition, feeding and stocking practices, especially as they affect the health and growth rates of cultured species; sustainable production techniques; bioengineering studies on the design and management of offshore and land-based systems; the improvement of quality and marketing of farmed products; sociological and societal impacts of aquaculture, and more. This is the official Journal of the European Aquaculture Society.