FFM-ViT: an efficient fish species classification method based on deep features and transformers.

IF 2 3区农林科学 Q2 FISHERIES

Journal of fish biology Pub Date : 2025-09-30 DOI:10.1111/jfb.70213

Yuwei Gao, Xiaoyong Li, Jian Xiang, Yunjie Xie, Chen Yang, Wan Xiang

{"title":"FFM-ViT: an efficient fish species classification method based on deep features and transformers.","authors":"Yuwei Gao, Xiaoyong Li, Jian Xiang, Yunjie Xie, Chen Yang, Wan Xiang","doi":"10.1111/jfb.70213","DOIUrl":null,"url":null,"abstract":"<p><p>Morphological identification of fish species plays a crucial role in the monitoring and management of fishery resources and biodiversity conservation. However, existing classification methods fail to meet practical needs when confronted with small fish data sets and high similarity. In this paper, we propose a novel deep learning method called feature fusion module vision transformer (FFM-ViT). The essence of FFM-ViT lies in abandoning the direct patch operation used in traditional vision transformer (ViT) and introducing Mobile Inverted Bottleneck Convolution (MBConv) and Fused Mobile Inverted Bottleneck Convolution (Fuse-MBConv) blocks to obtain more accurate high-dimensional information. To enhance feature extraction capability and channel feature fusion, we also introduce the channel spatial merge attention (CSMA) module. Furthermore, we have curated a dataset consisting of 78 categories named Oceanfish78. Our model achieves an impressive accuracy rate of 90.2% on this dataset, surpassing the 80.4% accuracy achieved by the ViT model without pre-trained weights significantly. Additionally, we conducted tests on several datasets, such as fish4knowledge and Fish31, while comparing our proposed method with other deep learning models, including shufflenet, convnext and swin transformer, through comprehensive empirical analysis. The results demonstrate that our proposed method outperforms existing approaches comprehensively, not only providing an effective solution for fish classification, but also offering valuable insights for approximate target recognition in other environments.</p>","PeriodicalId":15794,"journal":{"name":"Journal of fish biology","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of fish biology","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1111/jfb.70213","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FISHERIES","Score":null,"Total":0}

引用次数: 0

Abstract

Morphological identification of fish species plays a crucial role in the monitoring and management of fishery resources and biodiversity conservation. However, existing classification methods fail to meet practical needs when confronted with small fish data sets and high similarity. In this paper, we propose a novel deep learning method called feature fusion module vision transformer (FFM-ViT). The essence of FFM-ViT lies in abandoning the direct patch operation used in traditional vision transformer (ViT) and introducing Mobile Inverted Bottleneck Convolution (MBConv) and Fused Mobile Inverted Bottleneck Convolution (Fuse-MBConv) blocks to obtain more accurate high-dimensional information. To enhance feature extraction capability and channel feature fusion, we also introduce the channel spatial merge attention (CSMA) module. Furthermore, we have curated a dataset consisting of 78 categories named Oceanfish78. Our model achieves an impressive accuracy rate of 90.2% on this dataset, surpassing the 80.4% accuracy achieved by the ViT model without pre-trained weights significantly. Additionally, we conducted tests on several datasets, such as fish4knowledge and Fish31, while comparing our proposed method with other deep learning models, including shufflenet, convnext and swin transformer, through comprehensive empirical analysis. The results demonstrate that our proposed method outperforms existing approaches comprehensively, not only providing an effective solution for fish classification, but also offering valuable insights for approximate target recognition in other environments.

查看原文本刊更多论文

FFM-ViT：一种基于深度特征和变形的高效鱼类分类方法。

鱼类形态鉴定在渔业资源监测管理和生物多样性保护中起着至关重要的作用。然而，现有的分类方法在面对小型鱼类数据集和高相似度时无法满足实际需要。本文提出了一种新的深度学习方法——特征融合模块视觉变换（FFM-ViT）。FFM-ViT的本质在于抛弃了传统视觉变压器（ViT）的直接贴片运算，引入移动倒瓶颈卷积（MBConv）和融合移动倒瓶颈卷积（Fuse-MBConv）块，获得更精确的高维信息。为了增强特征提取和信道特征融合能力，我们还引入了信道空间合并注意（CSMA）模块。此外，我们已经策划了一个由78个类别组成的数据集，名为Oceanfish78。我们的模型在这个数据集上达到了令人印象深刻的90.2%的准确率，显著超过了没有预训练权值的ViT模型所达到的80.4%的准确率。此外，我们对fish4knowledge和Fish31等多个数据集进行了测试，并将我们提出的方法与其他深度学习模型（包括shufflenet、convnext和swin transformer）进行了综合实证分析。结果表明，本文提出的方法全面优于现有方法，不仅为鱼类分类提供了有效的解决方案，而且为其他环境下的近似目标识别提供了有价值的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of fish biology 生物-海洋与淡水生物学

CiteScore

4.00

自引率

10.00%

发文量

292

审稿时长

3 months

期刊介绍： The Journal of Fish Biology is a leading international journal for scientists engaged in all aspects of fishes and fisheries research, both fresh water and marine. The journal publishes high-quality papers relevant to the central theme of fish biology and aims to bring together under one cover an overall picture of the research in progress and to provide international communication among researchers in many disciplines with a common interest in the biology of fish.