Jian Li, Hong Yu, Rui Shi, Zhibo Cui, Hongjing Dai, Zijian Wu, Yue Wang
{"title":"YOLOv11-OPNet:用于复杂远程海上场景中渔船探测的双模态框架","authors":"Jian Li, Hong Yu, Rui Shi, Zhibo Cui, Hongjing Dai, Zijian Wu, Yue Wang","doi":"10.1007/s10499-025-02216-0","DOIUrl":null,"url":null,"abstract":"<div><p>Fishing vessel detection is a crucial technology for fishing vessel monitoring. However, in real aquaculture monitoring, the distant camera viewpoint causes fishing vessels and coastal watchtowers to share similar visual features, while the vessels’ small size and dense spatial distribution further complicate recognition. To solve the aforementioned issues, we propose the YOLOv11-OPNet (YOLOv11-Optical FlowNet) framework for fishing vessel detection. Firstly, with the challenge of low detection accuracy caused by insufficient image information of distant targets, which often leads to confusion with similar objects such as coastal watchtowers and reefs, this study introduces an optical flow modality into the backbone network of YOLOv11. By capturing the dynamic characteristics of fishing vessels through optical flow analysis, the model achieves improved detection accuracy. Furthermore, a cross-modal fusion module—termed LOF (light modality and Optical flow modality fusion module)—is proposed, integrating channel attention with a spatial attention mechanism. This module adaptively filters input features, facilitating effective fusion between optical flow and visible light modalities. Additionally, the PAN-FAN neck architecture of YOLOv11 is enhanced by incorporating an additional P2 layer with higher resolution and embedding the EMA (efficient multi-scale attention) mechanism. These improvements enable more robust multi-scale feature extraction and significantly enhance the semantic representation of small targets. Finally, to optimize the localization performance, Focaler-IoU is employed as the loss function, together with geometric parameter optimization of the bounding boxes, leading to improved detection recall. To evaluate the effectiveness of the proposed approach, comprehensive comparative tests and ablation studies were conducted on a self-constructed dataset of real-world farmed fishing vessels. Experimental results demonstrate that YOLOv11-OPNet achieves mean average accuracy (mAcc), recall (mRec), and precision (mAP) of 87.1%, 82.3%, and 84.0%, respectively. Compared to the baseline YOLOv11 model, these represent improvements of 7.5%, 15%, and 11.6%, respectively. The results confirm that YOLOv11-OPNet is well-suited for detecting fishing vessels in complex real-world aquatic environments characterized by poor image quality.</p></div>","PeriodicalId":8122,"journal":{"name":"Aquaculture International","volume":"33 6","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"YOLOv11-OPNet: dual-modality framework for fishing vessel detection in complex long-range maritime scenarios\",\"authors\":\"Jian Li, Hong Yu, Rui Shi, Zhibo Cui, Hongjing Dai, Zijian Wu, Yue Wang\",\"doi\":\"10.1007/s10499-025-02216-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Fishing vessel detection is a crucial technology for fishing vessel monitoring. However, in real aquaculture monitoring, the distant camera viewpoint causes fishing vessels and coastal watchtowers to share similar visual features, while the vessels’ small size and dense spatial distribution further complicate recognition. To solve the aforementioned issues, we propose the YOLOv11-OPNet (YOLOv11-Optical FlowNet) framework for fishing vessel detection. Firstly, with the challenge of low detection accuracy caused by insufficient image information of distant targets, which often leads to confusion with similar objects such as coastal watchtowers and reefs, this study introduces an optical flow modality into the backbone network of YOLOv11. By capturing the dynamic characteristics of fishing vessels through optical flow analysis, the model achieves improved detection accuracy. Furthermore, a cross-modal fusion module—termed LOF (light modality and Optical flow modality fusion module)—is proposed, integrating channel attention with a spatial attention mechanism. This module adaptively filters input features, facilitating effective fusion between optical flow and visible light modalities. Additionally, the PAN-FAN neck architecture of YOLOv11 is enhanced by incorporating an additional P2 layer with higher resolution and embedding the EMA (efficient multi-scale attention) mechanism. These improvements enable more robust multi-scale feature extraction and significantly enhance the semantic representation of small targets. Finally, to optimize the localization performance, Focaler-IoU is employed as the loss function, together with geometric parameter optimization of the bounding boxes, leading to improved detection recall. To evaluate the effectiveness of the proposed approach, comprehensive comparative tests and ablation studies were conducted on a self-constructed dataset of real-world farmed fishing vessels. Experimental results demonstrate that YOLOv11-OPNet achieves mean average accuracy (mAcc), recall (mRec), and precision (mAP) of 87.1%, 82.3%, and 84.0%, respectively. Compared to the baseline YOLOv11 model, these represent improvements of 7.5%, 15%, and 11.6%, respectively. The results confirm that YOLOv11-OPNet is well-suited for detecting fishing vessels in complex real-world aquatic environments characterized by poor image quality.</p></div>\",\"PeriodicalId\":8122,\"journal\":{\"name\":\"Aquaculture International\",\"volume\":\"33 6\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Aquaculture International\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10499-025-02216-0\",\"RegionNum\":3,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"FISHERIES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aquaculture International","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s10499-025-02216-0","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FISHERIES","Score":null,"Total":0}
YOLOv11-OPNet: dual-modality framework for fishing vessel detection in complex long-range maritime scenarios
Fishing vessel detection is a crucial technology for fishing vessel monitoring. However, in real aquaculture monitoring, the distant camera viewpoint causes fishing vessels and coastal watchtowers to share similar visual features, while the vessels’ small size and dense spatial distribution further complicate recognition. To solve the aforementioned issues, we propose the YOLOv11-OPNet (YOLOv11-Optical FlowNet) framework for fishing vessel detection. Firstly, with the challenge of low detection accuracy caused by insufficient image information of distant targets, which often leads to confusion with similar objects such as coastal watchtowers and reefs, this study introduces an optical flow modality into the backbone network of YOLOv11. By capturing the dynamic characteristics of fishing vessels through optical flow analysis, the model achieves improved detection accuracy. Furthermore, a cross-modal fusion module—termed LOF (light modality and Optical flow modality fusion module)—is proposed, integrating channel attention with a spatial attention mechanism. This module adaptively filters input features, facilitating effective fusion between optical flow and visible light modalities. Additionally, the PAN-FAN neck architecture of YOLOv11 is enhanced by incorporating an additional P2 layer with higher resolution and embedding the EMA (efficient multi-scale attention) mechanism. These improvements enable more robust multi-scale feature extraction and significantly enhance the semantic representation of small targets. Finally, to optimize the localization performance, Focaler-IoU is employed as the loss function, together with geometric parameter optimization of the bounding boxes, leading to improved detection recall. To evaluate the effectiveness of the proposed approach, comprehensive comparative tests and ablation studies were conducted on a self-constructed dataset of real-world farmed fishing vessels. Experimental results demonstrate that YOLOv11-OPNet achieves mean average accuracy (mAcc), recall (mRec), and precision (mAP) of 87.1%, 82.3%, and 84.0%, respectively. Compared to the baseline YOLOv11 model, these represent improvements of 7.5%, 15%, and 11.6%, respectively. The results confirm that YOLOv11-OPNet is well-suited for detecting fishing vessels in complex real-world aquatic environments characterized by poor image quality.
期刊介绍:
Aquaculture International is an international journal publishing original research papers, short communications, technical notes and review papers on all aspects of aquaculture.
The Journal covers topics such as the biology, physiology, pathology and genetics of cultured fish, crustaceans, molluscs and plants, especially new species; water quality of supply systems, fluctuations in water quality within farms and the environmental impacts of aquacultural operations; nutrition, feeding and stocking practices, especially as they affect the health and growth rates of cultured species; sustainable production techniques; bioengineering studies on the design and management of offshore and land-based systems; the improvement of quality and marketing of farmed products; sociological and societal impacts of aquaculture, and more.
This is the official Journal of the European Aquaculture Society.