{"title":"DiffViT-IBFD:基于扩散模型和视觉变压器的数据不平衡条件下滚动轴承故障诊断方法","authors":"Zheru Dong , Wen Zhao , Di Zhu , Zixin Zhang , Yuheng Ren","doi":"10.1016/j.array.2025.100483","DOIUrl":null,"url":null,"abstract":"<div><div>Rolling Bearing fault data collected from industrial sites often exhibit class distribution imbalance, which significantly degrades the performance of deep learning-based intelligent bearing fault diagnosis models. Currently, most existing studies use the Generative Adversarial Network (GAN) to generate samples from the minority class, thereby improving the model's performance. However, the training process of GAN is highly unstable and susceptible to mode collapse, resulting in poor-quality and low-diversity generated samples. Given that the diffusion model was initially designed for image generation and its training process is relatively stable, a new rolling bearing fault diagnosis approach (DiffViT-IBFD) based on the diffusion model and Vision Transformer under data imbalance conditions is proposed. First, the proposed DiffViT-IBFD converts one-dimensional vibration data into two-dimensional time-frequency images through the short-time fourier transform. Second, the diffusion model typically adopts Unet as the backbone, which may cause gradient vanishing or explosion. Moreover, the skip connection strategy of Unet may effectively fail to integrate the low-level and high-level features of the data. An Unet-based ReC-Unet network is constructed to accurately and comprehensively extract data features, thereby enabling the diffusion model to generate higher-quality time-frequency samples. Finally, the patching operation in the Vision Transformer may lose the horizontal information in the time-frequency image. A horizontal slicing strategy (Hs-Patch) is developed to comprehensively extract the horizontal features of time-frequency images, thereby enhancing the feature expression capability of Vision Transformer. Experimental results on two publicly available datasets show that DiffViT-IBFD outperforms existing methods under data imbalance conditions, validating its effectiveness.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"27 ","pages":"Article 100483"},"PeriodicalIF":4.5000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DiffViT-IBFD: A rolling bearing fault diagnosis approach based on diffusion model and vision transformer under data imbalance conditions\",\"authors\":\"Zheru Dong , Wen Zhao , Di Zhu , Zixin Zhang , Yuheng Ren\",\"doi\":\"10.1016/j.array.2025.100483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Rolling Bearing fault data collected from industrial sites often exhibit class distribution imbalance, which significantly degrades the performance of deep learning-based intelligent bearing fault diagnosis models. Currently, most existing studies use the Generative Adversarial Network (GAN) to generate samples from the minority class, thereby improving the model's performance. However, the training process of GAN is highly unstable and susceptible to mode collapse, resulting in poor-quality and low-diversity generated samples. Given that the diffusion model was initially designed for image generation and its training process is relatively stable, a new rolling bearing fault diagnosis approach (DiffViT-IBFD) based on the diffusion model and Vision Transformer under data imbalance conditions is proposed. First, the proposed DiffViT-IBFD converts one-dimensional vibration data into two-dimensional time-frequency images through the short-time fourier transform. Second, the diffusion model typically adopts Unet as the backbone, which may cause gradient vanishing or explosion. Moreover, the skip connection strategy of Unet may effectively fail to integrate the low-level and high-level features of the data. An Unet-based ReC-Unet network is constructed to accurately and comprehensively extract data features, thereby enabling the diffusion model to generate higher-quality time-frequency samples. Finally, the patching operation in the Vision Transformer may lose the horizontal information in the time-frequency image. A horizontal slicing strategy (Hs-Patch) is developed to comprehensively extract the horizontal features of time-frequency images, thereby enhancing the feature expression capability of Vision Transformer. Experimental results on two publicly available datasets show that DiffViT-IBFD outperforms existing methods under data imbalance conditions, validating its effectiveness.</div></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"27 \",\"pages\":\"Article 100483\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590005625001109\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625001109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
DiffViT-IBFD: A rolling bearing fault diagnosis approach based on diffusion model and vision transformer under data imbalance conditions
Rolling Bearing fault data collected from industrial sites often exhibit class distribution imbalance, which significantly degrades the performance of deep learning-based intelligent bearing fault diagnosis models. Currently, most existing studies use the Generative Adversarial Network (GAN) to generate samples from the minority class, thereby improving the model's performance. However, the training process of GAN is highly unstable and susceptible to mode collapse, resulting in poor-quality and low-diversity generated samples. Given that the diffusion model was initially designed for image generation and its training process is relatively stable, a new rolling bearing fault diagnosis approach (DiffViT-IBFD) based on the diffusion model and Vision Transformer under data imbalance conditions is proposed. First, the proposed DiffViT-IBFD converts one-dimensional vibration data into two-dimensional time-frequency images through the short-time fourier transform. Second, the diffusion model typically adopts Unet as the backbone, which may cause gradient vanishing or explosion. Moreover, the skip connection strategy of Unet may effectively fail to integrate the low-level and high-level features of the data. An Unet-based ReC-Unet network is constructed to accurately and comprehensively extract data features, thereby enabling the diffusion model to generate higher-quality time-frequency samples. Finally, the patching operation in the Vision Transformer may lose the horizontal information in the time-frequency image. A horizontal slicing strategy (Hs-Patch) is developed to comprehensively extract the horizontal features of time-frequency images, thereby enhancing the feature expression capability of Vision Transformer. Experimental results on two publicly available datasets show that DiffViT-IBFD outperforms existing methods under data imbalance conditions, validating its effectiveness.