{"title":"基于单模态和跨模态融合的多模态工业异常检测","authors":"Hao Cheng;Jiaxiang Luo;Xianyong Zhang","doi":"10.1109/TII.2025.3552723","DOIUrl":null,"url":null,"abstract":"Constructing comprehensive multimodal feature representations from RGB images (RGB) and point clouds (PT) in 2D–3D multimodal anomaly detection (MAD) methods is very important to reveal various types of industrial anomalies. For multimodal representations, most of the existing MAD methods often consider the explicit spatial correspondence between the modality-specific features extracted from RGB and PT through space-aligned fusion, while overlook the implicit interaction relationships between them. In this study, we propose a uni-modal and cross-modal fusion (UCF) method, which comprehensively incorporates the implicit relationships within and between modalities in multimodal representations. Specifically, UCF first establishes uni-modal and cross-modal embeddings to capture intramodal and intermodal relationships through uni-modal reconstruction and cross-modal mapping. Then, an adaptive nonequal fusion method is proposed to develop fusion embeddings, with the aim of preserving the primary features and reducing interference of the uni-modal and cross-modal embeddings. Finally, uni-modal, cross-modal, and fusion embeddings are all collaborated to reveal anomalies existing in different modalities. Experiments conducted on the MVTec 3D-AD benchmark and the real-world surface mount inspection demonstrate that the proposed UCF outperforms existing approaches, particularly in precise anomaly localization.","PeriodicalId":13301,"journal":{"name":"IEEE Transactions on Industrial Informatics","volume":"21 6","pages":"5000-5010"},"PeriodicalIF":11.7000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal Industrial Anomaly Detection via Uni-Modal and Cross-Modal Fusion\",\"authors\":\"Hao Cheng;Jiaxiang Luo;Xianyong Zhang\",\"doi\":\"10.1109/TII.2025.3552723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Constructing comprehensive multimodal feature representations from RGB images (RGB) and point clouds (PT) in 2D–3D multimodal anomaly detection (MAD) methods is very important to reveal various types of industrial anomalies. For multimodal representations, most of the existing MAD methods often consider the explicit spatial correspondence between the modality-specific features extracted from RGB and PT through space-aligned fusion, while overlook the implicit interaction relationships between them. In this study, we propose a uni-modal and cross-modal fusion (UCF) method, which comprehensively incorporates the implicit relationships within and between modalities in multimodal representations. Specifically, UCF first establishes uni-modal and cross-modal embeddings to capture intramodal and intermodal relationships through uni-modal reconstruction and cross-modal mapping. Then, an adaptive nonequal fusion method is proposed to develop fusion embeddings, with the aim of preserving the primary features and reducing interference of the uni-modal and cross-modal embeddings. Finally, uni-modal, cross-modal, and fusion embeddings are all collaborated to reveal anomalies existing in different modalities. Experiments conducted on the MVTec 3D-AD benchmark and the real-world surface mount inspection demonstrate that the proposed UCF outperforms existing approaches, particularly in precise anomaly localization.\",\"PeriodicalId\":13301,\"journal\":{\"name\":\"IEEE Transactions on Industrial Informatics\",\"volume\":\"21 6\",\"pages\":\"5000-5010\"},\"PeriodicalIF\":11.7000,\"publicationDate\":\"2025-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Industrial Informatics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10948502/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Industrial Informatics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10948502/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Multimodal Industrial Anomaly Detection via Uni-Modal and Cross-Modal Fusion
Constructing comprehensive multimodal feature representations from RGB images (RGB) and point clouds (PT) in 2D–3D multimodal anomaly detection (MAD) methods is very important to reveal various types of industrial anomalies. For multimodal representations, most of the existing MAD methods often consider the explicit spatial correspondence between the modality-specific features extracted from RGB and PT through space-aligned fusion, while overlook the implicit interaction relationships between them. In this study, we propose a uni-modal and cross-modal fusion (UCF) method, which comprehensively incorporates the implicit relationships within and between modalities in multimodal representations. Specifically, UCF first establishes uni-modal and cross-modal embeddings to capture intramodal and intermodal relationships through uni-modal reconstruction and cross-modal mapping. Then, an adaptive nonequal fusion method is proposed to develop fusion embeddings, with the aim of preserving the primary features and reducing interference of the uni-modal and cross-modal embeddings. Finally, uni-modal, cross-modal, and fusion embeddings are all collaborated to reveal anomalies existing in different modalities. Experiments conducted on the MVTec 3D-AD benchmark and the real-world surface mount inspection demonstrate that the proposed UCF outperforms existing approaches, particularly in precise anomaly localization.
期刊介绍:
The IEEE Transactions on Industrial Informatics is a multidisciplinary journal dedicated to publishing technical papers that connect theory with practical applications of informatics in industrial settings. It focuses on the utilization of information in intelligent, distributed, and agile industrial automation and control systems. The scope includes topics such as knowledge-based and AI-enhanced automation, intelligent computer control systems, flexible and collaborative manufacturing, industrial informatics in software-defined vehicles and robotics, computer vision, industrial cyber-physical and industrial IoT systems, real-time and networked embedded systems, security in industrial processes, industrial communications, systems interoperability, and human-machine interaction.