FishDetectLLM: Multimodal instruction tuning with large language models for fish detection

IF 7.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jiaxin Zhu , Shibai Yin , Xin Liu , Xingyang Wang , Yee-Hong Yang
{"title":"FishDetectLLM: Multimodal instruction tuning with large language models for fish detection","authors":"Jiaxin Zhu ,&nbsp;Shibai Yin ,&nbsp;Xin Liu ,&nbsp;Xingyang Wang ,&nbsp;Yee-Hong Yang","doi":"10.1016/j.knosys.2025.113418","DOIUrl":null,"url":null,"abstract":"<div><div>Aquatic species play crucial roles in global ecosystems but are increasingly threatened by factors such as overfishing, coastal development and climate change. Existing deep learning methods address these challenges by employing powerful networks and large-scale, diverse datasets, separately tackling species recognition and trait identification during ongoing monitoring. However, they often exhibit limited generalization ability. Inspired by the human ability to quickly identify fish species and their locations with just a glance at an underwater image or scene, we introduce FishDetectLLM—a framework built on the lightweight TinyLLaVA architecture. FishDetectLLM utilizes the powerful reasoning capabilities and vast world knowledge of large language models (LLMs) to address the fish detection problem, providing both fish classification results and predicted bounding boxes for fish. Specifically, we create instruction dialogues for fish detection that connect fish taxonomy with classification descriptions and map location descriptions to the corresponding coordinates of bounding box in the input images from the recently released large-scale FishNet dataset. Then, we pretrain and fine-tune FishDetectLLM to achieve fish detection using the created dataset, leveraging the principle of augmenting human knowledge. Our results show that FishDetectLLM significantly outperforms existing multimodal LLMs and task-specific methods. Unlike conventional detection architectures that struggle to generalize beyond the training data, FishDetectLLM exhibits strong generalization capabilities, achieving robust performance on unseen data. This innovation paves the way for future applications of MLLMs in full research and offers valuable tools for the conservation of fish biodiversity.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"318 ","pages":"Article 113418"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125004654","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Aquatic species play crucial roles in global ecosystems but are increasingly threatened by factors such as overfishing, coastal development and climate change. Existing deep learning methods address these challenges by employing powerful networks and large-scale, diverse datasets, separately tackling species recognition and trait identification during ongoing monitoring. However, they often exhibit limited generalization ability. Inspired by the human ability to quickly identify fish species and their locations with just a glance at an underwater image or scene, we introduce FishDetectLLM—a framework built on the lightweight TinyLLaVA architecture. FishDetectLLM utilizes the powerful reasoning capabilities and vast world knowledge of large language models (LLMs) to address the fish detection problem, providing both fish classification results and predicted bounding boxes for fish. Specifically, we create instruction dialogues for fish detection that connect fish taxonomy with classification descriptions and map location descriptions to the corresponding coordinates of bounding box in the input images from the recently released large-scale FishNet dataset. Then, we pretrain and fine-tune FishDetectLLM to achieve fish detection using the created dataset, leveraging the principle of augmenting human knowledge. Our results show that FishDetectLLM significantly outperforms existing multimodal LLMs and task-specific methods. Unlike conventional detection architectures that struggle to generalize beyond the training data, FishDetectLLM exhibits strong generalization capabilities, achieving robust performance on unseen data. This innovation paves the way for future applications of MLLMs in full research and offers valuable tools for the conservation of fish biodiversity.
FishDetectLLM: 利用大型语言模型进行鱼类检测的多模式指令调整
水生物种在全球生态系统中发挥着至关重要的作用,但它们日益受到过度捕捞、沿海开发和气候变化等因素的威胁。现有的深度学习方法通过使用强大的网络和大规模、多样化的数据集来解决这些挑战,在持续监测期间分别处理物种识别和特征识别。然而,他们往往表现出有限的泛化能力。受人类快速识别鱼类物种及其位置的能力的启发,只需瞥一眼水下图像或场景,我们介绍了fishdetectllm -一个基于轻量级TinyLLaVA架构的框架。FishDetectLLM利用大语言模型(llm)强大的推理能力和广泛的世界知识来解决鱼类检测问题,既提供鱼类分类结果,又提供鱼类的预测边界盒。具体来说,我们创建了用于鱼类检测的指令对话,将鱼类分类与分类描述联系起来,并将位置描述映射到来自最近发布的大规模渔网数据集的输入图像中的相应边界框坐标。然后,我们利用增强人类知识的原理,利用创建的数据集对FishDetectLLM进行预训练和微调,以实现鱼类检测。我们的研究结果表明,FishDetectLLM显著优于现有的多模态llm和特定任务方法。与难以泛化训练数据的传统检测架构不同,FishDetectLLM展示了强大的泛化能力,在未见过的数据上实现了稳健的性能。这一创新为今后mlm在全面研究中的应用铺平了道路,并为鱼类生物多样性的保护提供了有价值的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信