DiffViT-IBFD:基于扩散模型和视觉变压器的数据不平衡条件下滚动轴承故障诊断方法

IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS
Array Pub Date : 2025-08-07 DOI:10.1016/j.array.2025.100483
Zheru Dong , Wen Zhao , Di Zhu , Zixin Zhang , Yuheng Ren
{"title":"DiffViT-IBFD:基于扩散模型和视觉变压器的数据不平衡条件下滚动轴承故障诊断方法","authors":"Zheru Dong ,&nbsp;Wen Zhao ,&nbsp;Di Zhu ,&nbsp;Zixin Zhang ,&nbsp;Yuheng Ren","doi":"10.1016/j.array.2025.100483","DOIUrl":null,"url":null,"abstract":"<div><div>Rolling Bearing fault data collected from industrial sites often exhibit class distribution imbalance, which significantly degrades the performance of deep learning-based intelligent bearing fault diagnosis models. Currently, most existing studies use the Generative Adversarial Network (GAN) to generate samples from the minority class, thereby improving the model's performance. However, the training process of GAN is highly unstable and susceptible to mode collapse, resulting in poor-quality and low-diversity generated samples. Given that the diffusion model was initially designed for image generation and its training process is relatively stable, a new rolling bearing fault diagnosis approach (DiffViT-IBFD) based on the diffusion model and Vision Transformer under data imbalance conditions is proposed. First, the proposed DiffViT-IBFD converts one-dimensional vibration data into two-dimensional time-frequency images through the short-time fourier transform. Second, the diffusion model typically adopts Unet as the backbone, which may cause gradient vanishing or explosion. Moreover, the skip connection strategy of Unet may effectively fail to integrate the low-level and high-level features of the data. An Unet-based ReC-Unet network is constructed to accurately and comprehensively extract data features, thereby enabling the diffusion model to generate higher-quality time-frequency samples. Finally, the patching operation in the Vision Transformer may lose the horizontal information in the time-frequency image. A horizontal slicing strategy (Hs-Patch) is developed to comprehensively extract the horizontal features of time-frequency images, thereby enhancing the feature expression capability of Vision Transformer. Experimental results on two publicly available datasets show that DiffViT-IBFD outperforms existing methods under data imbalance conditions, validating its effectiveness.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"27 ","pages":"Article 100483"},"PeriodicalIF":4.5000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DiffViT-IBFD: A rolling bearing fault diagnosis approach based on diffusion model and vision transformer under data imbalance conditions\",\"authors\":\"Zheru Dong ,&nbsp;Wen Zhao ,&nbsp;Di Zhu ,&nbsp;Zixin Zhang ,&nbsp;Yuheng Ren\",\"doi\":\"10.1016/j.array.2025.100483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Rolling Bearing fault data collected from industrial sites often exhibit class distribution imbalance, which significantly degrades the performance of deep learning-based intelligent bearing fault diagnosis models. Currently, most existing studies use the Generative Adversarial Network (GAN) to generate samples from the minority class, thereby improving the model's performance. However, the training process of GAN is highly unstable and susceptible to mode collapse, resulting in poor-quality and low-diversity generated samples. Given that the diffusion model was initially designed for image generation and its training process is relatively stable, a new rolling bearing fault diagnosis approach (DiffViT-IBFD) based on the diffusion model and Vision Transformer under data imbalance conditions is proposed. First, the proposed DiffViT-IBFD converts one-dimensional vibration data into two-dimensional time-frequency images through the short-time fourier transform. Second, the diffusion model typically adopts Unet as the backbone, which may cause gradient vanishing or explosion. Moreover, the skip connection strategy of Unet may effectively fail to integrate the low-level and high-level features of the data. An Unet-based ReC-Unet network is constructed to accurately and comprehensively extract data features, thereby enabling the diffusion model to generate higher-quality time-frequency samples. Finally, the patching operation in the Vision Transformer may lose the horizontal information in the time-frequency image. A horizontal slicing strategy (Hs-Patch) is developed to comprehensively extract the horizontal features of time-frequency images, thereby enhancing the feature expression capability of Vision Transformer. Experimental results on two publicly available datasets show that DiffViT-IBFD outperforms existing methods under data imbalance conditions, validating its effectiveness.</div></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"27 \",\"pages\":\"Article 100483\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590005625001109\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625001109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

从工业现场收集的滚动轴承故障数据往往表现出类分布不平衡,这严重降低了基于深度学习的智能轴承故障诊断模型的性能。目前,大多数现有研究使用生成对抗网络(GAN)从少数类中生成样本,从而提高模型的性能。然而,GAN的训练过程高度不稳定,容易发生模态崩溃,导致生成的样本质量差、多样性低。鉴于扩散模型最初是为图像生成而设计的,且其训练过程相对稳定,提出了一种基于扩散模型和视觉转换器的数据不平衡条件下滚动轴承故障诊断新方法(DiffViT-IBFD)。首先,本文提出的DiffViT-IBFD通过短时傅里叶变换将一维振动数据转换为二维时频图像。其次,扩散模型通常采用Unet作为主干,这可能导致梯度消失或爆炸。此外,Unet的跳过连接策略可能无法有效地整合数据的低级和高级特征。构建基于unet的ReC-Unet网络,准确、全面地提取数据特征,使扩散模型能够生成更高质量的时频样本。最后,Vision Transformer中的修补操作可能会丢失时频图像中的水平信息。为了全面提取时频图像的水平特征,提出了一种水平切片策略(Hs-Patch),增强了Vision Transformer的特征表达能力。在两个公开数据集上的实验结果表明,在数据不平衡条件下,diffviti - ibfd优于现有方法,验证了其有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DiffViT-IBFD: A rolling bearing fault diagnosis approach based on diffusion model and vision transformer under data imbalance conditions
Rolling Bearing fault data collected from industrial sites often exhibit class distribution imbalance, which significantly degrades the performance of deep learning-based intelligent bearing fault diagnosis models. Currently, most existing studies use the Generative Adversarial Network (GAN) to generate samples from the minority class, thereby improving the model's performance. However, the training process of GAN is highly unstable and susceptible to mode collapse, resulting in poor-quality and low-diversity generated samples. Given that the diffusion model was initially designed for image generation and its training process is relatively stable, a new rolling bearing fault diagnosis approach (DiffViT-IBFD) based on the diffusion model and Vision Transformer under data imbalance conditions is proposed. First, the proposed DiffViT-IBFD converts one-dimensional vibration data into two-dimensional time-frequency images through the short-time fourier transform. Second, the diffusion model typically adopts Unet as the backbone, which may cause gradient vanishing or explosion. Moreover, the skip connection strategy of Unet may effectively fail to integrate the low-level and high-level features of the data. An Unet-based ReC-Unet network is constructed to accurately and comprehensively extract data features, thereby enabling the diffusion model to generate higher-quality time-frequency samples. Finally, the patching operation in the Vision Transformer may lose the horizontal information in the time-frequency image. A horizontal slicing strategy (Hs-Patch) is developed to comprehensively extract the horizontal features of time-frequency images, thereby enhancing the feature expression capability of Vision Transformer. Experimental results on two publicly available datasets show that DiffViT-IBFD outperforms existing methods under data imbalance conditions, validating its effectiveness.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Array
Array Computer Science-General Computer Science
CiteScore
4.40
自引率
0.00%
发文量
93
审稿时长
45 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信