IFShip: Interpretable fine-grained ship classification with domain knowledge-enhanced vision-language models

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-04-15 DOI:10.1016/j.patcog.2025.111672

Mingning Guo, Mengwei Wu, Yuxiang Shen, Haifeng Li, Chao Tao

{"title":"IFShip: Interpretable fine-grained ship classification with domain knowledge-enhanced vision-language models","authors":"Mingning Guo, Mengwei Wu, Yuxiang Shen, Haifeng Li, Chao Tao","doi":"10.1016/j.patcog.2025.111672","DOIUrl":null,"url":null,"abstract":"<div><div>End-to-end interpretation currently dominates the remote sensing fine-grained ship classification (RS-FGSC) task. However, the inference process remains uninterpretable, leading to criticisms of these models as “black box” systems. To address this issue, we propose a domain knowledge-enhanced Chain-of-Thought (CoT) prompt generation mechanism, which is used to semi-automatically construct a task-specific instruction-following dataset, TITANIC-FGS. By training on TITANIC-FGS, we adapt general-domain vision-language models (VLMs) to the FGSC task, resulting in a model named IFShip. Building upon IFShip, we develop an FGSC visual chatbot that redefines the FGSC problem as a step-by-step reasoning task and conveys the reasoning process in natural language. Experimental results show that IFShip outperforms state-of-the-art FGSC algorithms in both interpretability and classification accuracy. Furthermore, compared to VLMs such as LLaVA and MiniGPT-4, IFShip demonstrates superior performance on the FGSC task. It provides an accurate chain of reasoning when fine-grained ship types are recognizable to the human eye and offers interpretable explanations when they are not.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111672"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003322","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

End-to-end interpretation currently dominates the remote sensing fine-grained ship classification (RS-FGSC) task. However, the inference process remains uninterpretable, leading to criticisms of these models as “black box” systems. To address this issue, we propose a domain knowledge-enhanced Chain-of-Thought (CoT) prompt generation mechanism, which is used to semi-automatically construct a task-specific instruction-following dataset, TITANIC-FGS. By training on TITANIC-FGS, we adapt general-domain vision-language models (VLMs) to the FGSC task, resulting in a model named IFShip. Building upon IFShip, we develop an FGSC visual chatbot that redefines the FGSC problem as a step-by-step reasoning task and conveys the reasoning process in natural language. Experimental results show that IFShip outperforms state-of-the-art FGSC algorithms in both interpretability and classification accuracy. Furthermore, compared to VLMs such as LLaVA and MiniGPT-4, IFShip demonstrates superior performance on the FGSC task. It provides an accurate chain of reasoning when fine-grained ship types are recognizable to the human eye and offers interpretable explanations when they are not.

查看原文本刊更多论文

IFShip：基于领域知识增强的视觉语言模型的可解释细粒度船舶分类

端到端解释目前主导着遥感细粒度船舶分类（RS-FGSC）任务。然而，推理过程仍然是不可解释的，导致这些模型被批评为“黑箱”系统。为了解决这一问题，我们提出了一种领域知识增强的思维链（CoT）提示生成机制，该机制用于半自动构建特定任务指令遵循数据集泰坦尼克- fgs。通过在泰坦尼克- fgs上进行训练，我们将通用领域视觉语言模型（VLMs）应用于FGSC任务，得到了一个名为IFShip的模型。在IFShip的基础上，我们开发了一个FGSC视觉聊天机器人，它将FGSC问题重新定义为一步一步的推理任务，并以自然语言传达推理过程。实验结果表明，IFShip在可解释性和分类精度方面都优于最先进的FGSC算法。此外，与LLaVA和MiniGPT-4等vlm相比，IFShip在FGSC任务上表现出优越的性能。当细粒度的船型可以被人眼识别时，它提供了一个准确的推理链，当它们不能被人眼识别时，它提供了可解释的解释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.