Harnessing multimodal data fusion to advance accurate identification of fish feeding intensity

IF 4.4 1区农林科学 Q1 AGRICULTURAL ENGINEERING

Biosystems Engineering Pub Date : 2024-08-06 DOI:10.1016/j.biosystemseng.2024.08.001

Zhuangzhuang Du , Meng Cui , Xianbao Xu , Zhuangzhuang Bai , Jie Han , Wanchao Li , Jianan Yang , Xiaohang Liu , Cong Wang , Daoliang Li

{"title":"Harnessing multimodal data fusion to advance accurate identification of fish feeding intensity","authors":"Zhuangzhuang Du , Meng Cui , Xianbao Xu , Zhuangzhuang Bai , Jie Han , Wanchao Li , Jianan Yang , Xiaohang Liu , Cong Wang , Daoliang Li","doi":"10.1016/j.biosystemseng.2024.08.001","DOIUrl":null,"url":null,"abstract":"<div><p>Accurately identifying the fish feeding intensity plays a vital role in aquaculture. While traditional methods are limited by single modality (e.g., water quality, vision, audio), they often lack comprehensive representation, leading to low identification accuracy. In contrast, the multimodal fusion methods leverage the fusion of features from different modalities to obtain richer target features, thereby significantly enhancing the performance of fish feeding intensity assessment (FFIA). In this work a multimodal dataset called MRS-FFIA was introduced. The MRS-FFIA dataset consists of 7611 labelled audio, video and acoustic dataset, and divided the dataset into four different feeding intensity (strong, medium, weak, and none). To address the limitations of single modality methods, a Multimodal Fusion of Fish Feeding Intensity fusion (MFFFI) model was proposed. The MFFFI model is first extracting deep features from three modal data audio (Mel), video (RGB), Acoustic (SI). Then, image stitching techniques are employed to fuse these extracted features. Finally, the fused features are passed through a classifier to obtain the results. The test results show that the accuracy of the fused multimodal information is 99.26%, which improves the accuracy by 12.80%, 13.77%, and 2.86%, respectively, compared to the best results for single-modality (audio, video and acoustic dataset). This result demonstrates that the method proposed in this paper is better at classifying the feeding intensity of fish and can achieve higher accuracy. In addition, compared with the mainstream single-modality approach, the model improves 1.5%–10.8% in accuracy, and the lightweight effect is more obvious. Based on the multimodal fusion method, the feeding decision can be optimised effectively, which provides technical support for the development of intelligent feeding systems.</p></div>","PeriodicalId":9173,"journal":{"name":"Biosystems Engineering","volume":"246 ","pages":"Pages 135-149"},"PeriodicalIF":4.4000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biosystems Engineering","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1537511024001739","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Accurately identifying the fish feeding intensity plays a vital role in aquaculture. While traditional methods are limited by single modality (e.g., water quality, vision, audio), they often lack comprehensive representation, leading to low identification accuracy. In contrast, the multimodal fusion methods leverage the fusion of features from different modalities to obtain richer target features, thereby significantly enhancing the performance of fish feeding intensity assessment (FFIA). In this work a multimodal dataset called MRS-FFIA was introduced. The MRS-FFIA dataset consists of 7611 labelled audio, video and acoustic dataset, and divided the dataset into four different feeding intensity (strong, medium, weak, and none). To address the limitations of single modality methods, a Multimodal Fusion of Fish Feeding Intensity fusion (MFFFI) model was proposed. The MFFFI model is first extracting deep features from three modal data audio (Mel), video (RGB), Acoustic (SI). Then, image stitching techniques are employed to fuse these extracted features. Finally, the fused features are passed through a classifier to obtain the results. The test results show that the accuracy of the fused multimodal information is 99.26%, which improves the accuracy by 12.80%, 13.77%, and 2.86%, respectively, compared to the best results for single-modality (audio, video and acoustic dataset). This result demonstrates that the method proposed in this paper is better at classifying the feeding intensity of fish and can achieve higher accuracy. In addition, compared with the mainstream single-modality approach, the model improves 1.5%–10.8% in accuracy, and the lightweight effect is more obvious. Based on the multimodal fusion method, the feeding decision can be optimised effectively, which provides technical support for the development of intelligent feeding systems.

查看原文本刊更多论文

利用多模态数据融合推进鱼类摄食强度的精确识别

准确识别鱼类的摄食强度在水产养殖中起着至关重要的作用。传统方法受限于单一模式（如水质、视觉、音频），往往缺乏全面的表征，导致识别准确率较低。相比之下，多模态融合方法利用不同模态的特征进行融合，以获得更丰富的目标特征，从而显著提高鱼类摄食强度评估（FFIA）的性能。本研究引入了一个名为 MRS-FFIA 的多模态数据集。MRS-FFIA 数据集由 7611 个带标签的音频、视频和声学数据集组成，并将数据集分为四种不同的摄食强度（强、中、弱和无）。针对单一模态方法的局限性，提出了鱼类摄食强度多模态融合模型（MFFFI）。MFFFI 模型首先从音频（Mel）、视频（RGB）和声学（SI）三种模态数据中提取深度特征。然后，采用图像拼接技术来融合这些提取的特征。最后，将融合后的特征通过分类器得出结果。测试结果表明，融合后的多模态信息准确率为 99.26%，与单模态（音频、视频和声学数据集）的最佳结果相比，准确率分别提高了 12.80%、13.77% 和 2.86%。这一结果表明，本文提出的方法能更好地对鱼类的摄食强度进行分类，并能达到更高的准确度。此外，与主流的单模态方法相比，该模型的准确率提高了 1.5%-10.8%，轻量化效果更加明显。基于多模态融合方法，可以有效优化投喂决策，为智能投喂系统的开发提供技术支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biosystems Engineering 农林科学-农业工程

CiteScore

10.60

自引率

7.80%

发文量

239

审稿时长

53 days

期刊介绍： Biosystems Engineering publishes research in engineering and the physical sciences that represent advances in understanding or modelling of the performance of biological systems for sustainable developments in land use and the environment, agriculture and amenity, bioproduction processes and the food chain. The subject matter of the journal reflects the wide range and interdisciplinary nature of research in engineering for biological systems.