Zongliang Xie , Kaiyu Zhang , Jinglong Chen , Chi-Guhn Lee , Shuilong He
{"title":"机械故障诊断的代际累积和跨节点随机下降分布式训练方法","authors":"Zongliang Xie , Kaiyu Zhang , Jinglong Chen , Chi-Guhn Lee , Shuilong He","doi":"10.1016/j.asoc.2025.113532","DOIUrl":null,"url":null,"abstract":"<div><div>With increasing complexity of deep neural networks and continuous expansion of training datasets, the computational cost of model training grows exponentially. To reduce training time, distributed training systems leveraging multiple computing devices have been developed for computational acceleration. However, compared with the rapidly increasing computing power, the communication bandwidth between devices increases slowly and becomes a bottleneck restricting the efficiency of distributed training. In this paper, an efficient distributed training method called gradient transfer compression (GTC) is proposed to reduce communication overhead and improve training efficiency. The methodology involves three key techniques: (1) Intergenerational accumulation, where gradients generated over multiple iterations are stored and accumulated, reducing the frequency of communication between computing devices; (2) Cross-node random drop, which synchronizes gradients with a specified ratio to decrease network traffic while ensuring model convergence; and (3) Mixed precision training, which reduces the bandwidth required for gradient communication. The effectiveness of GTC is demonstrated through experiments on two rolling bearing datasets. Compared with the conventional PyTorch distributed training method, the proposed method reduces the GPU memory usage by 97.10 % and 14.02 %, increases the training efficiency by 24.74 % and 8.03 % respectively in two cases, while maintaining the diagnostic performance of the model.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"181 ","pages":"Article 113532"},"PeriodicalIF":7.2000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A distributed training method with intergenerational accumulation and cross-node random drop for mechanical fault diagnosis\",\"authors\":\"Zongliang Xie , Kaiyu Zhang , Jinglong Chen , Chi-Guhn Lee , Shuilong He\",\"doi\":\"10.1016/j.asoc.2025.113532\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With increasing complexity of deep neural networks and continuous expansion of training datasets, the computational cost of model training grows exponentially. To reduce training time, distributed training systems leveraging multiple computing devices have been developed for computational acceleration. However, compared with the rapidly increasing computing power, the communication bandwidth between devices increases slowly and becomes a bottleneck restricting the efficiency of distributed training. In this paper, an efficient distributed training method called gradient transfer compression (GTC) is proposed to reduce communication overhead and improve training efficiency. The methodology involves three key techniques: (1) Intergenerational accumulation, where gradients generated over multiple iterations are stored and accumulated, reducing the frequency of communication between computing devices; (2) Cross-node random drop, which synchronizes gradients with a specified ratio to decrease network traffic while ensuring model convergence; and (3) Mixed precision training, which reduces the bandwidth required for gradient communication. The effectiveness of GTC is demonstrated through experiments on two rolling bearing datasets. Compared with the conventional PyTorch distributed training method, the proposed method reduces the GPU memory usage by 97.10 % and 14.02 %, increases the training efficiency by 24.74 % and 8.03 % respectively in two cases, while maintaining the diagnostic performance of the model.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"181 \",\"pages\":\"Article 113532\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625008439\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625008439","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A distributed training method with intergenerational accumulation and cross-node random drop for mechanical fault diagnosis
With increasing complexity of deep neural networks and continuous expansion of training datasets, the computational cost of model training grows exponentially. To reduce training time, distributed training systems leveraging multiple computing devices have been developed for computational acceleration. However, compared with the rapidly increasing computing power, the communication bandwidth between devices increases slowly and becomes a bottleneck restricting the efficiency of distributed training. In this paper, an efficient distributed training method called gradient transfer compression (GTC) is proposed to reduce communication overhead and improve training efficiency. The methodology involves three key techniques: (1) Intergenerational accumulation, where gradients generated over multiple iterations are stored and accumulated, reducing the frequency of communication between computing devices; (2) Cross-node random drop, which synchronizes gradients with a specified ratio to decrease network traffic while ensuring model convergence; and (3) Mixed precision training, which reduces the bandwidth required for gradient communication. The effectiveness of GTC is demonstrated through experiments on two rolling bearing datasets. Compared with the conventional PyTorch distributed training method, the proposed method reduces the GPU memory usage by 97.10 % and 14.02 %, increases the training efficiency by 24.74 % and 8.03 % respectively in two cases, while maintaining the diagnostic performance of the model.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.