{"title":"RPS-DFN: Residual perception self-attention deep fusion network for multimodal IIoT device state identification","authors":"Anying Chai, Zhaobo Fang, Ping Huang, Chenyang Guo, Lei Wang, Wanda Yin","doi":"10.1016/j.iot.2025.101790","DOIUrl":null,"url":null,"abstract":"<div><div>The Industrial Internet of Things (IIoT) integrates advanced technologies such as Internet of Things (loT) technology and Artificial Intelligence (Al) into various aspects of industrial production and achieves accurate identification of equipment status through the deployment of a large number of sensors. However, due to the heterogeneity of data and the limitations of traditional data fusion methods, which often overlook cross-modal interactions and feature contributions, leading to poor fusion performance. In this paper, we propose an end-to-end Residual Perceptive Self-attention Deep Fusion Network (RPS-DFN) to effectively fuse time-series signals such as force, vibration, and acoustic emission with device images captured at the same time. We propose a multi-modal data unification method based on Mel-spectrogram transformation to align the dimensions of signals and images. Then, we improve the ResNet18 pre-trained on ImageNet by designing a shared dimensionality reduction layer and a cross-modal attention module. The general visual representations learned by its pre-trained weights can be transferred to the small-sample equipment status detection task, enhancing the differences between features of different statuses. Finally, we design a two-layer Transformer encoder to learn the contributions and interactions of different features for downstream tasks, modeling and analyzing different features to achieve self-attentive deep fusion of features. The experimental results show that our method achieves an accuracy of 95.24% on PHM2010 and 75.71% on the cross-tool status detection task on the small-sample Mudestera dataset, verifying the practical applicability of the proposed method.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"34 ","pages":"Article 101790"},"PeriodicalIF":7.6000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S254266052500304X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The Industrial Internet of Things (IIoT) integrates advanced technologies such as Internet of Things (loT) technology and Artificial Intelligence (Al) into various aspects of industrial production and achieves accurate identification of equipment status through the deployment of a large number of sensors. However, due to the heterogeneity of data and the limitations of traditional data fusion methods, which often overlook cross-modal interactions and feature contributions, leading to poor fusion performance. In this paper, we propose an end-to-end Residual Perceptive Self-attention Deep Fusion Network (RPS-DFN) to effectively fuse time-series signals such as force, vibration, and acoustic emission with device images captured at the same time. We propose a multi-modal data unification method based on Mel-spectrogram transformation to align the dimensions of signals and images. Then, we improve the ResNet18 pre-trained on ImageNet by designing a shared dimensionality reduction layer and a cross-modal attention module. The general visual representations learned by its pre-trained weights can be transferred to the small-sample equipment status detection task, enhancing the differences between features of different statuses. Finally, we design a two-layer Transformer encoder to learn the contributions and interactions of different features for downstream tasks, modeling and analyzing different features to achieve self-attentive deep fusion of features. The experimental results show that our method achieves an accuracy of 95.24% on PHM2010 and 75.71% on the cross-tool status detection task on the small-sample Mudestera dataset, verifying the practical applicability of the proposed method.
期刊介绍:
Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT.
The journal will place a high priority on timely publication, and provide a home for high quality.
Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.