The effect of the head number for multi-head self-attention in remaining useful life prediction of rolling bearing and interpretability

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-11-16 DOI:10.1016/j.neucom.2024.128946

Qiwu Zhao, Xiaoli Zhang, Fangzhen Wang, Panfeng Fan, Erick Mbeka

{"title":"The effect of the head number for multi-head self-attention in remaining useful life prediction of rolling bearing and interpretability","authors":"Qiwu Zhao, Xiaoli Zhang, Fangzhen Wang, Panfeng Fan, Erick Mbeka","doi":"10.1016/j.neucom.2024.128946","DOIUrl":null,"url":null,"abstract":"<div><div>As one of the machine learning (ML) models, the multi-head self-attention mechanism (MSM) is competent in encoding high-level feature representations, providing computing superiorities, and systematically processing sequences bypassing the recurrent neural networks (RNN) models. However, the model performance and computational results are affected by head number, and the lack of impact interpretability has become a primary obstacle due to the complex internal working mechanisms. Therefore, the effects of the head number of the MSM on the accuracy of the result, the robustness of the model, and computation efficiency are investigated in the remaining useful life (RUL) prediction of rolling bearings. The results show that the accuracy of prediction results will be reduced caused by large or few head numbers. In addition, the more heads are selected, the more robust and higher the predictive efficiency of the model is achieved. The above effects are explained relying on the visualization of the attention weight distribution and functional networks, which are constructed and solved by the equivalent fully connected layer and graph theory analysis, respectively. The model's attention coefficient distribution during training and prediction shows that the representative information will be captured inadequately if fewer heads are selected, which causes MSM to neglect to assign large attention coefficients to degraded information. On the contrary, representational degradation information and redundant information will be acquired by models with too many heads. MSM will be disturbed by this redundant information in the attention weight distribution, resulting in incorrect allocation of attention. Both of these cases will reduce the accuracy of the prediction results. In addition, the selection rules of the head number are established based on the feature complexity that is measured by the sample entropy (SamEn). The local range for head selection is also found based on the relationship between head number and feature complexity; The effects of the head number of the MSM on the robustness of the model and computation efficiency are explained by the changes in the three parameters (average of the clustering coefficients, global efficiency, and of the average shortest path length) of the graph, which is constructed after solving the function network. The research provides a reference for rolling bearing prediction with high computational accuracy, calculation efficiency, and strong robustness using MSM.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128946"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092523122401717X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

As one of the machine learning (ML) models, the multi-head self-attention mechanism (MSM) is competent in encoding high-level feature representations, providing computing superiorities, and systematically processing sequences bypassing the recurrent neural networks (RNN) models. However, the model performance and computational results are affected by head number, and the lack of impact interpretability has become a primary obstacle due to the complex internal working mechanisms. Therefore, the effects of the head number of the MSM on the accuracy of the result, the robustness of the model, and computation efficiency are investigated in the remaining useful life (RUL) prediction of rolling bearings. The results show that the accuracy of prediction results will be reduced caused by large or few head numbers. In addition, the more heads are selected, the more robust and higher the predictive efficiency of the model is achieved. The above effects are explained relying on the visualization of the attention weight distribution and functional networks, which are constructed and solved by the equivalent fully connected layer and graph theory analysis, respectively. The model's attention coefficient distribution during training and prediction shows that the representative information will be captured inadequately if fewer heads are selected, which causes MSM to neglect to assign large attention coefficients to degraded information. On the contrary, representational degradation information and redundant information will be acquired by models with too many heads. MSM will be disturbed by this redundant information in the attention weight distribution, resulting in incorrect allocation of attention. Both of these cases will reduce the accuracy of the prediction results. In addition, the selection rules of the head number are established based on the feature complexity that is measured by the sample entropy (SamEn). The local range for head selection is also found based on the relationship between head number and feature complexity; The effects of the head number of the MSM on the robustness of the model and computation efficiency are explained by the changes in the three parameters (average of the clustering coefficients, global efficiency, and of the average shortest path length) of the graph, which is constructed after solving the function network. The research provides a reference for rolling bearing prediction with high computational accuracy, calculation efficiency, and strong robustness using MSM.

查看原文本刊更多论文

头数对多头自注意在滚动轴承剩余使用寿命预测中的影响及可解释性

作为机器学习（ML）模型之一，多头自注意机制（MSM）能够编码高级特征表示，提供计算优势，并绕过循环神经网络（RNN）模型系统地处理序列。然而，模型性能和计算结果受水头数的影响，且由于内部工作机制复杂，缺乏冲击可解释性已成为主要障碍。因此，在滚动轴承剩余使用寿命（RUL）预测中，研究了MSM头数对结果精度、模型鲁棒性和计算效率的影响。结果表明，头数过大或过小都会降低预测结果的准确性。此外，选择的头部越多，模型的鲁棒性越强，预测效率越高。上述效果的解释依赖于注意力权重分布和功能网络的可视化，它们分别由等效全连通层和图论分析构造和求解。模型在训练和预测过程中的注意系数分布表明，如果选择较少的头部，代表性信息将被捕获不足，这导致MSM忽略为退化信息分配较大的注意系数。相反，头部过多的模型会获取表征退化信息和冗余信息。注意权重分布中的冗余信息会干扰男男性行为，导致注意力分配不正确。这两种情况都会降低预测结果的准确性。此外，根据样本熵（SamEn）衡量的特征复杂度，建立头像数的选择规则。根据头像数与特征复杂度的关系找到头像选择的局部范围；通过求解函数网络后构建的图的三个参数（聚类系数平均值、全局效率和平均最短路径长度）的变化来解释MSM头数对模型鲁棒性和计算效率的影响。该研究为基于MSM的滚动轴承预测提供了计算精度高、计算效率高、鲁棒性强的参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.