Jiaxin Zhu , Shibai Yin , Xin Liu , Xingyang Wang , Yee-Hong Yang
{"title":"FishDetectLLM: 利用大型语言模型进行鱼类检测的多模式指令调整","authors":"Jiaxin Zhu , Shibai Yin , Xin Liu , Xingyang Wang , Yee-Hong Yang","doi":"10.1016/j.knosys.2025.113418","DOIUrl":null,"url":null,"abstract":"<div><div>Aquatic species play crucial roles in global ecosystems but are increasingly threatened by factors such as overfishing, coastal development and climate change. Existing deep learning methods address these challenges by employing powerful networks and large-scale, diverse datasets, separately tackling species recognition and trait identification during ongoing monitoring. However, they often exhibit limited generalization ability. Inspired by the human ability to quickly identify fish species and their locations with just a glance at an underwater image or scene, we introduce FishDetectLLM—a framework built on the lightweight TinyLLaVA architecture. FishDetectLLM utilizes the powerful reasoning capabilities and vast world knowledge of large language models (LLMs) to address the fish detection problem, providing both fish classification results and predicted bounding boxes for fish. Specifically, we create instruction dialogues for fish detection that connect fish taxonomy with classification descriptions and map location descriptions to the corresponding coordinates of bounding box in the input images from the recently released large-scale FishNet dataset. Then, we pretrain and fine-tune FishDetectLLM to achieve fish detection using the created dataset, leveraging the principle of augmenting human knowledge. Our results show that FishDetectLLM significantly outperforms existing multimodal LLMs and task-specific methods. Unlike conventional detection architectures that struggle to generalize beyond the training data, FishDetectLLM exhibits strong generalization capabilities, achieving robust performance on unseen data. This innovation paves the way for future applications of MLLMs in full research and offers valuable tools for the conservation of fish biodiversity.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"318 ","pages":"Article 113418"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FishDetectLLM: Multimodal instruction tuning with large language models for fish detection\",\"authors\":\"Jiaxin Zhu , Shibai Yin , Xin Liu , Xingyang Wang , Yee-Hong Yang\",\"doi\":\"10.1016/j.knosys.2025.113418\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Aquatic species play crucial roles in global ecosystems but are increasingly threatened by factors such as overfishing, coastal development and climate change. Existing deep learning methods address these challenges by employing powerful networks and large-scale, diverse datasets, separately tackling species recognition and trait identification during ongoing monitoring. However, they often exhibit limited generalization ability. Inspired by the human ability to quickly identify fish species and their locations with just a glance at an underwater image or scene, we introduce FishDetectLLM—a framework built on the lightweight TinyLLaVA architecture. FishDetectLLM utilizes the powerful reasoning capabilities and vast world knowledge of large language models (LLMs) to address the fish detection problem, providing both fish classification results and predicted bounding boxes for fish. Specifically, we create instruction dialogues for fish detection that connect fish taxonomy with classification descriptions and map location descriptions to the corresponding coordinates of bounding box in the input images from the recently released large-scale FishNet dataset. Then, we pretrain and fine-tune FishDetectLLM to achieve fish detection using the created dataset, leveraging the principle of augmenting human knowledge. Our results show that FishDetectLLM significantly outperforms existing multimodal LLMs and task-specific methods. Unlike conventional detection architectures that struggle to generalize beyond the training data, FishDetectLLM exhibits strong generalization capabilities, achieving robust performance on unseen data. This innovation paves the way for future applications of MLLMs in full research and offers valuable tools for the conservation of fish biodiversity.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"318 \",\"pages\":\"Article 113418\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125004654\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125004654","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
FishDetectLLM: Multimodal instruction tuning with large language models for fish detection
Aquatic species play crucial roles in global ecosystems but are increasingly threatened by factors such as overfishing, coastal development and climate change. Existing deep learning methods address these challenges by employing powerful networks and large-scale, diverse datasets, separately tackling species recognition and trait identification during ongoing monitoring. However, they often exhibit limited generalization ability. Inspired by the human ability to quickly identify fish species and their locations with just a glance at an underwater image or scene, we introduce FishDetectLLM—a framework built on the lightweight TinyLLaVA architecture. FishDetectLLM utilizes the powerful reasoning capabilities and vast world knowledge of large language models (LLMs) to address the fish detection problem, providing both fish classification results and predicted bounding boxes for fish. Specifically, we create instruction dialogues for fish detection that connect fish taxonomy with classification descriptions and map location descriptions to the corresponding coordinates of bounding box in the input images from the recently released large-scale FishNet dataset. Then, we pretrain and fine-tune FishDetectLLM to achieve fish detection using the created dataset, leveraging the principle of augmenting human knowledge. Our results show that FishDetectLLM significantly outperforms existing multimodal LLMs and task-specific methods. Unlike conventional detection architectures that struggle to generalize beyond the training data, FishDetectLLM exhibits strong generalization capabilities, achieving robust performance on unseen data. This innovation paves the way for future applications of MLLMs in full research and offers valuable tools for the conservation of fish biodiversity.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.