RPS-DFN: Residual perception self-attention deep fusion network for multimodal IIoT device state identification

IF 7.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Internet of Things Pub Date : 2025-10-06 DOI:10.1016/j.iot.2025.101790

Anying Chai, Zhaobo Fang, Ping Huang, Chenyang Guo, Lei Wang, Wanda Yin

{"title":"RPS-DFN: Residual perception self-attention deep fusion network for multimodal IIoT device state identification","authors":"Anying Chai, Zhaobo Fang, Ping Huang, Chenyang Guo, Lei Wang, Wanda Yin","doi":"10.1016/j.iot.2025.101790","DOIUrl":null,"url":null,"abstract":"<div><div>The Industrial Internet of Things (IIoT) integrates advanced technologies such as Internet of Things (loT) technology and Artificial Intelligence (Al) into various aspects of industrial production and achieves accurate identification of equipment status through the deployment of a large number of sensors. However, due to the heterogeneity of data and the limitations of traditional data fusion methods, which often overlook cross-modal interactions and feature contributions, leading to poor fusion performance. In this paper, we propose an end-to-end Residual Perceptive Self-attention Deep Fusion Network (RPS-DFN) to effectively fuse time-series signals such as force, vibration, and acoustic emission with device images captured at the same time. We propose a multi-modal data unification method based on Mel-spectrogram transformation to align the dimensions of signals and images. Then, we improve the ResNet18 pre-trained on ImageNet by designing a shared dimensionality reduction layer and a cross-modal attention module. The general visual representations learned by its pre-trained weights can be transferred to the small-sample equipment status detection task, enhancing the differences between features of different statuses. Finally, we design a two-layer Transformer encoder to learn the contributions and interactions of different features for downstream tasks, modeling and analyzing different features to achieve self-attentive deep fusion of features. The experimental results show that our method achieves an accuracy of 95.24% on PHM2010 and 75.71% on the cross-tool status detection task on the small-sample Mudestera dataset, verifying the practical applicability of the proposed method.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"34 ","pages":"Article 101790"},"PeriodicalIF":7.6000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S254266052500304X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The Industrial Internet of Things (IIoT) integrates advanced technologies such as Internet of Things (loT) technology and Artificial Intelligence (Al) into various aspects of industrial production and achieves accurate identification of equipment status through the deployment of a large number of sensors. However, due to the heterogeneity of data and the limitations of traditional data fusion methods, which often overlook cross-modal interactions and feature contributions, leading to poor fusion performance. In this paper, we propose an end-to-end Residual Perceptive Self-attention Deep Fusion Network (RPS-DFN) to effectively fuse time-series signals such as force, vibration, and acoustic emission with device images captured at the same time. We propose a multi-modal data unification method based on Mel-spectrogram transformation to align the dimensions of signals and images. Then, we improve the ResNet18 pre-trained on ImageNet by designing a shared dimensionality reduction layer and a cross-modal attention module. The general visual representations learned by its pre-trained weights can be transferred to the small-sample equipment status detection task, enhancing the differences between features of different statuses. Finally, we design a two-layer Transformer encoder to learn the contributions and interactions of different features for downstream tasks, modeling and analyzing different features to achieve self-attentive deep fusion of features. The experimental results show that our method achieves an accuracy of 95.24% on PHM2010 and 75.71% on the cross-tool status detection task on the small-sample Mudestera dataset, verifying the practical applicability of the proposed method.

查看原文本刊更多论文

残差感知自关注深度融合网络多模态IIoT设备状态识别

工业物联网（IIoT）将物联网（loT）技术、人工智能（Al）等先进技术融入工业生产的各个环节，通过大量传感器的部署，实现对设备状态的准确识别。然而，由于数据的异质性和传统数据融合方法的局限性，往往忽略了跨模态交互和特征贡献，导致融合性能较差。在本文中，我们提出了一种端到端残差感知自关注深度融合网络（RPS-DFN），以有效地融合力、振动和声发射等时间序列信号与同时捕获的设备图像。提出了一种基于mel -谱图变换的多模态数据统一方法，实现了信号和图像的维数对齐。然后，通过设计共享降维层和跨模态注意力模块，对ImageNet上预训练的ResNet18进行改进。通过其预训练权值学习到的通用视觉表征可以转移到小样本设备状态检测任务中，增强不同状态特征之间的差异性。最后，我们设计了一个双层Transformer编码器，学习不同特征对下游任务的贡献和交互，对不同特征进行建模和分析，实现特征的自关注深度融合。实验结果表明，该方法在PHM2010上的准确率为95.24%，在小样本Mudestera数据集上的跨工具状态检测准确率为75.71%，验证了该方法的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Internet of Things Multiple-

CiteScore

3.60

自引率

5.10%

发文量

115

审稿时长

37 days

期刊介绍： Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT. The journal will place a high priority on timely publication, and provide a home for high quality. Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.