Yuang Yang, Xiaole Wang, Fugui Zhang, Zhenchao Wu, Yu Wang, Yujie Liu, Xuan Lv, Bowen Luo, Liqing Chen, Yang Yang
{"title":"一个多光谱图像驱动的油菜籽冠层实例分割网络","authors":"Yuang Yang, Xiaole Wang, Fugui Zhang, Zhenchao Wu, Yu Wang, Yujie Liu, Xuan Lv, Bowen Luo, Liqing Chen, Yang Yang","doi":"10.1016/j.aiia.2025.05.008","DOIUrl":null,"url":null,"abstract":"<div><div>Precise detection of rapeseed and the growth of its canopy area are crucial phenotypic indicators of its growth status. Achieving accurate identification of the rapeseed target and its growth region provides significant data support for phenotypic analysis and breeding research. However, in natural field environments, rapeseed detection remains a substantial challenge due to the limited feature representation capabilities of RGB-only modalities. To address this challenge, this study proposes a dual-modal instance segmentation network, MSNet, based on YOLOv11n-seg, integrating both RGB and Near-Infrared (NIR) modalities. The main improvements of this network include three different fusion location strategies (frontend fusion, mid-stage fusion, and backend fusion) and the newly introduced Hierarchical Attention Fusion Block (HAFB) for multimodal feature fusion. Comparative experiments on fusion locations indicate that the mid-stage fusion strategy achieves the best balance between detection accuracy and parameter efficiency. Compared to the baseline network, the <em>mAP50:95</em> improvement can reach up to 3.5 %. After introducing the HAFB module, the MSNet-H-HAFB model demonstrates a 6.5 % increase in <em>mAP50:95</em> relative to the baseline network, with less than a 38 % increase in parameter count. It is noteworthy that the mid-stage fusion consistently delivered the best detection performance in all experiments, providing clear design guidance for selecting fusion locations in future multimodal networks. In addition, comparisons with various RGB-only instance segmentation models show that all the proposed MSNet-HAFB fusion models significantly outperform single-modal models in rapeseed count detection tasks, confirming the potential advantages of multispectral fusion strategies in agricultural target recognition. Finally, the MSNet was applied in an agricultural case study, including vegetation index level analysis and frost damage classification. The results show that ZN6–2836 and ZS11 were predicted as potential superior varieties, and the EVI2 vegetation index achieved the best performance in rapeseed frost damage classification.</div></div>","PeriodicalId":52814,"journal":{"name":"Artificial Intelligence in Agriculture","volume":"15 4","pages":"Pages 642-658"},"PeriodicalIF":8.2000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MSNet: A multispectral-image driven rapeseed canopy instance segmentation network\",\"authors\":\"Yuang Yang, Xiaole Wang, Fugui Zhang, Zhenchao Wu, Yu Wang, Yujie Liu, Xuan Lv, Bowen Luo, Liqing Chen, Yang Yang\",\"doi\":\"10.1016/j.aiia.2025.05.008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Precise detection of rapeseed and the growth of its canopy area are crucial phenotypic indicators of its growth status. Achieving accurate identification of the rapeseed target and its growth region provides significant data support for phenotypic analysis and breeding research. However, in natural field environments, rapeseed detection remains a substantial challenge due to the limited feature representation capabilities of RGB-only modalities. To address this challenge, this study proposes a dual-modal instance segmentation network, MSNet, based on YOLOv11n-seg, integrating both RGB and Near-Infrared (NIR) modalities. The main improvements of this network include three different fusion location strategies (frontend fusion, mid-stage fusion, and backend fusion) and the newly introduced Hierarchical Attention Fusion Block (HAFB) for multimodal feature fusion. Comparative experiments on fusion locations indicate that the mid-stage fusion strategy achieves the best balance between detection accuracy and parameter efficiency. Compared to the baseline network, the <em>mAP50:95</em> improvement can reach up to 3.5 %. After introducing the HAFB module, the MSNet-H-HAFB model demonstrates a 6.5 % increase in <em>mAP50:95</em> relative to the baseline network, with less than a 38 % increase in parameter count. It is noteworthy that the mid-stage fusion consistently delivered the best detection performance in all experiments, providing clear design guidance for selecting fusion locations in future multimodal networks. In addition, comparisons with various RGB-only instance segmentation models show that all the proposed MSNet-HAFB fusion models significantly outperform single-modal models in rapeseed count detection tasks, confirming the potential advantages of multispectral fusion strategies in agricultural target recognition. Finally, the MSNet was applied in an agricultural case study, including vegetation index level analysis and frost damage classification. The results show that ZN6–2836 and ZS11 were predicted as potential superior varieties, and the EVI2 vegetation index achieved the best performance in rapeseed frost damage classification.</div></div>\",\"PeriodicalId\":52814,\"journal\":{\"name\":\"Artificial Intelligence in Agriculture\",\"volume\":\"15 4\",\"pages\":\"Pages 642-658\"},\"PeriodicalIF\":8.2000,\"publicationDate\":\"2025-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Agriculture\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2589721725000637\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Agriculture","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589721725000637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
MSNet: A multispectral-image driven rapeseed canopy instance segmentation network
Precise detection of rapeseed and the growth of its canopy area are crucial phenotypic indicators of its growth status. Achieving accurate identification of the rapeseed target and its growth region provides significant data support for phenotypic analysis and breeding research. However, in natural field environments, rapeseed detection remains a substantial challenge due to the limited feature representation capabilities of RGB-only modalities. To address this challenge, this study proposes a dual-modal instance segmentation network, MSNet, based on YOLOv11n-seg, integrating both RGB and Near-Infrared (NIR) modalities. The main improvements of this network include three different fusion location strategies (frontend fusion, mid-stage fusion, and backend fusion) and the newly introduced Hierarchical Attention Fusion Block (HAFB) for multimodal feature fusion. Comparative experiments on fusion locations indicate that the mid-stage fusion strategy achieves the best balance between detection accuracy and parameter efficiency. Compared to the baseline network, the mAP50:95 improvement can reach up to 3.5 %. After introducing the HAFB module, the MSNet-H-HAFB model demonstrates a 6.5 % increase in mAP50:95 relative to the baseline network, with less than a 38 % increase in parameter count. It is noteworthy that the mid-stage fusion consistently delivered the best detection performance in all experiments, providing clear design guidance for selecting fusion locations in future multimodal networks. In addition, comparisons with various RGB-only instance segmentation models show that all the proposed MSNet-HAFB fusion models significantly outperform single-modal models in rapeseed count detection tasks, confirming the potential advantages of multispectral fusion strategies in agricultural target recognition. Finally, the MSNet was applied in an agricultural case study, including vegetation index level analysis and frost damage classification. The results show that ZN6–2836 and ZS11 were predicted as potential superior varieties, and the EVI2 vegetation index achieved the best performance in rapeseed frost damage classification.