CLIP-AFIR: A Contrastive Language-Image Pretraining Model for Accurate Fish Individual Fine-grained Re-identification

IF 3.9 1区 农林科学 Q1 FISHERIES
Jianing Quan , Can Wang , Yunchen Tian
{"title":"CLIP-AFIR: A Contrastive Language-Image Pretraining Model for Accurate Fish Individual Fine-grained Re-identification","authors":"Jianing Quan ,&nbsp;Can Wang ,&nbsp;Yunchen Tian","doi":"10.1016/j.aquaculture.2025.742885","DOIUrl":null,"url":null,"abstract":"<div><div>Similar to identifying different human individuals, accurately determining ”who is who” is a critical component in intelligent aquaculture, with broad applications in disease analysis, growth monitoring, and other aspects. The accuracy of fish Re-Identification (ReID) has greatly improved currently, yet challenges remain in distinguishing the same species, such as low precision and ID switching. Inspired by the multimodal large models, we propose a Contrastive Language-Image Pretraining model for Accurate Fish Individual fine-grained Re-identification (CLIP-AFIR). Based on the cross-modal contrastive learning, a set of trainable text tokens is introduced to represent different individuals, combined with the proposed Prompt Learner Module (PLM). The text and image encoders are trained through the two-stage paradigm, which enhances the adaptability in fish recognition. To improve the ability to discriminate subtle individual differences, a lightweight Fine-grained Feature Enhancement Module (FFEM) is further designed. By using shifted windows with overlapping regions to compute mask self-attention on local areas of the image, it enables effective representation of fine-grained local variations. On the constructed grouper dataset, the proposed CLIP-AFIR shows a significant improvement in the evaluations. Applied to the non-continuous fish individual recognition task, CLIP-AFIR achieves an accuracy of 98.6%, surpassing the state-of-the-art by 5.4 points.</div></div>","PeriodicalId":8375,"journal":{"name":"Aquaculture","volume":"610 ","pages":"Article 742885"},"PeriodicalIF":3.9000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aquaculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0044848625007719","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FISHERIES","Score":null,"Total":0}
引用次数: 0

Abstract

Similar to identifying different human individuals, accurately determining ”who is who” is a critical component in intelligent aquaculture, with broad applications in disease analysis, growth monitoring, and other aspects. The accuracy of fish Re-Identification (ReID) has greatly improved currently, yet challenges remain in distinguishing the same species, such as low precision and ID switching. Inspired by the multimodal large models, we propose a Contrastive Language-Image Pretraining model for Accurate Fish Individual fine-grained Re-identification (CLIP-AFIR). Based on the cross-modal contrastive learning, a set of trainable text tokens is introduced to represent different individuals, combined with the proposed Prompt Learner Module (PLM). The text and image encoders are trained through the two-stage paradigm, which enhances the adaptability in fish recognition. To improve the ability to discriminate subtle individual differences, a lightweight Fine-grained Feature Enhancement Module (FFEM) is further designed. By using shifted windows with overlapping regions to compute mask self-attention on local areas of the image, it enables effective representation of fine-grained local variations. On the constructed grouper dataset, the proposed CLIP-AFIR shows a significant improvement in the evaluations. Applied to the non-continuous fish individual recognition task, CLIP-AFIR achieves an accuracy of 98.6%, surpassing the state-of-the-art by 5.4 points.
CLIP-AFIR:一种用于精确鱼个体细粒度再识别的对比语言图像预训练模型
与识别不同的人类个体类似,准确确定“谁是谁”是智能水产养殖的关键组成部分,在疾病分析、生长监测等方面有着广泛的应用。目前,鱼类再识别(ReID)的准确性有了很大的提高,但在识别同一物种方面仍然存在精度低和ID切换等问题。受多模态大模型的启发,我们提出了一种用于精确鱼个体细粒度再识别(CLIP-AFIR)的对比语言-图像预训练模型。在跨模态对比学习的基础上,引入一组可训练的文本标记来表示不同的个体,并结合提出的提示学习者模块(PLM)。采用两阶段模式对文本和图像编码器进行训练,增强了对鱼类识别的适应性。为了提高识别细微个体差异的能力,进一步设计了轻量级的细粒度特征增强模块(FFEM)。通过使用重叠区域的移位窗口来计算图像局部区域的掩模自关注,可以有效地表示细粒度的局部变化。在构建的石斑鱼数据集上,本文提出的CLIP-AFIR在评估方面有显著的改进。应用于非连续鱼类个体识别任务,CLIP-AFIR的准确率达到98.6%,比目前最先进的技术高出5.4分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Aquaculture
Aquaculture 农林科学-海洋与淡水生物学
CiteScore
8.60
自引率
17.80%
发文量
1246
审稿时长
56 days
期刊介绍: Aquaculture is an international journal for the exploration, improvement and management of all freshwater and marine food resources. It publishes novel and innovative research of world-wide interest on farming of aquatic organisms, which includes finfish, mollusks, crustaceans and aquatic plants for human consumption. Research on ornamentals is not a focus of the Journal. Aquaculture only publishes papers with a clear relevance to improving aquaculture practices or a potential application.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信