Zhuangzhuang Du , Meng Cui , Xianbao Xu , Zhuangzhuang Bai , Jie Han , Wanchao Li , Jianan Yang , Xiaohang Liu , Cong Wang , Daoliang Li
{"title":"Harnessing multimodal data fusion to advance accurate identification of fish feeding intensity","authors":"Zhuangzhuang Du , Meng Cui , Xianbao Xu , Zhuangzhuang Bai , Jie Han , Wanchao Li , Jianan Yang , Xiaohang Liu , Cong Wang , Daoliang Li","doi":"10.1016/j.biosystemseng.2024.08.001","DOIUrl":null,"url":null,"abstract":"<div><p>Accurately identifying the fish feeding intensity plays a vital role in aquaculture. While traditional methods are limited by single modality (e.g., water quality, vision, audio), they often lack comprehensive representation, leading to low identification accuracy. In contrast, the multimodal fusion methods leverage the fusion of features from different modalities to obtain richer target features, thereby significantly enhancing the performance of fish feeding intensity assessment (FFIA). In this work a multimodal dataset called MRS-FFIA was introduced. The MRS-FFIA dataset consists of 7611 labelled audio, video and acoustic dataset, and divided the dataset into four different feeding intensity (strong, medium, weak, and none). To address the limitations of single modality methods, a Multimodal Fusion of Fish Feeding Intensity fusion (MFFFI) model was proposed. The MFFFI model is first extracting deep features from three modal data audio (Mel), video (RGB), Acoustic (SI). Then, image stitching techniques are employed to fuse these extracted features. Finally, the fused features are passed through a classifier to obtain the results. The test results show that the accuracy of the fused multimodal information is 99.26%, which improves the accuracy by 12.80%, 13.77%, and 2.86%, respectively, compared to the best results for single-modality (audio, video and acoustic dataset). This result demonstrates that the method proposed in this paper is better at classifying the feeding intensity of fish and can achieve higher accuracy. In addition, compared with the mainstream single-modality approach, the model improves 1.5%–10.8% in accuracy, and the lightweight effect is more obvious. Based on the multimodal fusion method, the feeding decision can be optimised effectively, which provides technical support for the development of intelligent feeding systems.</p></div>","PeriodicalId":9173,"journal":{"name":"Biosystems Engineering","volume":"246 ","pages":"Pages 135-149"},"PeriodicalIF":4.4000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biosystems Engineering","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1537511024001739","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Accurately identifying the fish feeding intensity plays a vital role in aquaculture. While traditional methods are limited by single modality (e.g., water quality, vision, audio), they often lack comprehensive representation, leading to low identification accuracy. In contrast, the multimodal fusion methods leverage the fusion of features from different modalities to obtain richer target features, thereby significantly enhancing the performance of fish feeding intensity assessment (FFIA). In this work a multimodal dataset called MRS-FFIA was introduced. The MRS-FFIA dataset consists of 7611 labelled audio, video and acoustic dataset, and divided the dataset into four different feeding intensity (strong, medium, weak, and none). To address the limitations of single modality methods, a Multimodal Fusion of Fish Feeding Intensity fusion (MFFFI) model was proposed. The MFFFI model is first extracting deep features from three modal data audio (Mel), video (RGB), Acoustic (SI). Then, image stitching techniques are employed to fuse these extracted features. Finally, the fused features are passed through a classifier to obtain the results. The test results show that the accuracy of the fused multimodal information is 99.26%, which improves the accuracy by 12.80%, 13.77%, and 2.86%, respectively, compared to the best results for single-modality (audio, video and acoustic dataset). This result demonstrates that the method proposed in this paper is better at classifying the feeding intensity of fish and can achieve higher accuracy. In addition, compared with the mainstream single-modality approach, the model improves 1.5%–10.8% in accuracy, and the lightweight effect is more obvious. Based on the multimodal fusion method, the feeding decision can be optimised effectively, which provides technical support for the development of intelligent feeding systems.
期刊介绍:
Biosystems Engineering publishes research in engineering and the physical sciences that represent advances in understanding or modelling of the performance of biological systems for sustainable developments in land use and the environment, agriculture and amenity, bioproduction processes and the food chain. The subject matter of the journal reflects the wide range and interdisciplinary nature of research in engineering for biological systems.