GFIA: Generative Fault Image Analysis via vision–language model its application to train bogie transmission system

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-06-25 DOI:10.1016/j.jvcir.2025.104482

Chunming Zhang , Yu Wang , Xinge You

{"title":"GFIA: Generative Fault Image Analysis via vision–language model its application to train bogie transmission system","authors":"Chunming Zhang , Yu Wang , Xinge You","doi":"10.1016/j.jvcir.2025.104482","DOIUrl":null,"url":null,"abstract":"<div><div>Multimedia fault analytics plays a critical role in industrial applications, ensuring safety and reliability. Previous studies have explored fault classification using either one-dimensional signals or two-dimensional images, while understanding fault types and providing appropriate responses remains challenging, especially for complex system failures. To step further in this field, we leverage the powerful reasoning and generative capabilities of Large Multimodal Models (LMMs) for the fault analysis, then transform multi-channel sensor signals from the system into structured grayscale images suitable for visual–language models. Additionally, a domain-specific, strongly supervised dataset is constructed, that is, the Bogie Transmission Unified Fault Dataset (BTU), which contains expert-curated fault types, causes, and solutions. By integrating both image and language modalities, we fine-tune a visual–language model, Generative Fault Image Analysis (GFIA), to enhance fault reasoning and interpretation. Extensive experiments on our BTU dataset demonstrate that GFIA achieves an average diagnostic accuracy exceeding 99.9% for motor faults, reaching 100% for gearbox faults, and exceeding 99.8% for leftaxlebox faults. The proposed GFIA model outperforms traditional deep-learning methods and state-of-the-art large language models, highlighting the effectiveness of vision–language integration for fault analysis.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104482"},"PeriodicalIF":3.1000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325000963","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Multimedia fault analytics plays a critical role in industrial applications, ensuring safety and reliability. Previous studies have explored fault classification using either one-dimensional signals or two-dimensional images, while understanding fault types and providing appropriate responses remains challenging, especially for complex system failures. To step further in this field, we leverage the powerful reasoning and generative capabilities of Large Multimodal Models (LMMs) for the fault analysis, then transform multi-channel sensor signals from the system into structured grayscale images suitable for visual–language models. Additionally, a domain-specific, strongly supervised dataset is constructed, that is, the Bogie Transmission Unified Fault Dataset (BTU), which contains expert-curated fault types, causes, and solutions. By integrating both image and language modalities, we fine-tune a visual–language model, Generative Fault Image Analysis (GFIA), to enhance fault reasoning and interpretation. Extensive experiments on our BTU dataset demonstrate that GFIA achieves an average diagnostic accuracy exceeding 99.9% for motor faults, reaching 100% for gearbox faults, and exceeding 99.8% for leftaxlebox faults. The proposed GFIA model outperforms traditional deep-learning methods and state-of-the-art large language models, highlighting the effectiveness of vision–language integration for fault analysis.

查看原文本刊更多论文

基于视觉语言模型的生成式故障图像分析及其在列车转向架传动系统中的应用

多媒体故障分析在工业应用中起着至关重要的作用，保证了系统的安全性和可靠性。以往的研究都是利用一维信号或二维图像进行故障分类，但理解故障类型并提供适当的响应仍然具有挑战性，特别是对于复杂的系统故障。为了在这一领域更进一步，我们利用大型多模态模型（Large Multimodal Models, lmm）强大的推理和生成能力进行故障分析，然后将来自系统的多通道传感器信号转换为适合视觉语言模型的结构化灰度图像。此外，还构建了一个特定领域的强监督数据集，即转向架传输统一故障数据集（BTU），其中包含专家管理的故障类型、原因和解决方案。通过整合图像和语言模式，我们微调了一个视觉语言模型，生成故障图像分析（GFIA），以增强故障推理和解释。在我们的BTU数据集上进行的大量实验表明，GFIA对电机故障的平均诊断准确率超过99.9%，对变速箱故障的平均诊断准确率达到100%，对左车箱故障的平均诊断准确率超过99.8%。所提出的GFIA模型优于传统的深度学习方法和最先进的大型语言模型，突出了视觉语言集成在故障分析中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.