Accurate industrial anomaly detection with efficient multimodal fusion

IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS
Array Pub Date : 2025-09-19 DOI:10.1016/j.array.2025.100512
Dinh-Cuong Hoang , Phan Xuan Tan , Anh-Nhat Nguyen , Ta Huu Anh Duong , Tuan-Minh Huynh , Duc-Manh Nguyen , Minh-Duc Cao , Duc-Huy Ngo , Thu-Uyen Nguyen , Khanh-Toan Phan , Minh-Quang Do , Xuan-Tung Dinh , Van-Hiep Duong , Ngoc-Anh Hoang , Van-Thiep Nguyen
{"title":"Accurate industrial anomaly detection with efficient multimodal fusion","authors":"Dinh-Cuong Hoang ,&nbsp;Phan Xuan Tan ,&nbsp;Anh-Nhat Nguyen ,&nbsp;Ta Huu Anh Duong ,&nbsp;Tuan-Minh Huynh ,&nbsp;Duc-Manh Nguyen ,&nbsp;Minh-Duc Cao ,&nbsp;Duc-Huy Ngo ,&nbsp;Thu-Uyen Nguyen ,&nbsp;Khanh-Toan Phan ,&nbsp;Minh-Quang Do ,&nbsp;Xuan-Tung Dinh ,&nbsp;Van-Hiep Duong ,&nbsp;Ngoc-Anh Hoang ,&nbsp;Van-Thiep Nguyen","doi":"10.1016/j.array.2025.100512","DOIUrl":null,"url":null,"abstract":"<div><div>Industrial anomaly detection is critical for ensuring quality and efficiency in modern manufacturing. However, existing deep learning models that rely solely on red-green-blue (RGB) images often fail to detect subtle structural defects, while most RGB-depth (RGBD) methods are computationally heavy and fragile in the presence of missing or noisy depth data. In this work, we propose a lightweight and real-time RGBD anomaly detection framework that not only refines per-modality features but also performs robust hierarchical fusion and tolerates missing inputs. Our approach employs a shared ResNet-50 backbone with a Modality-Specific Feature Enhancement (MSFE) module to amplify texture and geometric cues, followed by a Hierarchical Multi-Modal Fusion (HMM) encoder for cross-scale integration. We further introduce a curriculum-based anomalous feature generator to produce context-aware perturbations, training a compact two-layer discriminator to yield precise pixel-level normality scores. Extensive experiments on the MVTec Anomaly Detection (MVTec-AD) dataset, the Visual Anomaly (VisA) dataset, and a newly collected RealSense D435i RGBD dataset demonstrate up to 99.0% Pixel-level Area Under the Receiver Operating Characteristic Curve (P-AUROC), 99.6% Image-level AUROC (I-AUROC), 82.6% Area Under the Per-Region Overlap (AUPRO), and 45 frames per second (FPS) inference speed. These results validate the effectiveness and deployability of our approach in high-throughput industrial inspection scenarios.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100512"},"PeriodicalIF":4.5000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625001390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Industrial anomaly detection is critical for ensuring quality and efficiency in modern manufacturing. However, existing deep learning models that rely solely on red-green-blue (RGB) images often fail to detect subtle structural defects, while most RGB-depth (RGBD) methods are computationally heavy and fragile in the presence of missing or noisy depth data. In this work, we propose a lightweight and real-time RGBD anomaly detection framework that not only refines per-modality features but also performs robust hierarchical fusion and tolerates missing inputs. Our approach employs a shared ResNet-50 backbone with a Modality-Specific Feature Enhancement (MSFE) module to amplify texture and geometric cues, followed by a Hierarchical Multi-Modal Fusion (HMM) encoder for cross-scale integration. We further introduce a curriculum-based anomalous feature generator to produce context-aware perturbations, training a compact two-layer discriminator to yield precise pixel-level normality scores. Extensive experiments on the MVTec Anomaly Detection (MVTec-AD) dataset, the Visual Anomaly (VisA) dataset, and a newly collected RealSense D435i RGBD dataset demonstrate up to 99.0% Pixel-level Area Under the Receiver Operating Characteristic Curve (P-AUROC), 99.6% Image-level AUROC (I-AUROC), 82.6% Area Under the Per-Region Overlap (AUPRO), and 45 frames per second (FPS) inference speed. These results validate the effectiveness and deployability of our approach in high-throughput industrial inspection scenarios.
精确的工业异常检测与高效的多模态融合
在现代制造业中,工业异常检测是保证质量和效率的关键。然而,现有的仅依赖于红绿蓝(RGB)图像的深度学习模型往往无法检测到细微的结构缺陷,而大多数RGB-depth (RGBD)方法在存在缺失或噪声深度数据的情况下计算量大且脆弱。在这项工作中,我们提出了一个轻量级的实时RGBD异常检测框架,该框架不仅可以细化每模态特征,还可以执行鲁棒的分层融合并容忍缺失输入。我们的方法采用共享的ResNet-50骨干网和模态特定特征增强(MSFE)模块来放大纹理和几何线索,然后使用分层多模态融合(HMM)编码器进行跨尺度集成。我们进一步引入了一个基于课程的异常特征生成器来产生上下文感知的扰动,训练一个紧凑的两层鉴别器来产生精确的像素级正态性分数。在MVTec异常检测(MVTec- ad)数据集、视觉异常(VisA)数据集和新收集的RealSense D435i RGBD数据集上进行的大量实验表明,高达99.0%的像素级接收者工作特征曲线下面积(P-AUROC)、99.6%的图像级AUROC (I-AUROC)、82.6%的区域重叠面积(AUPRO)和45帧每秒(FPS)的推理速度。这些结果验证了我们的方法在高通量工业检测场景中的有效性和可部署性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Array
Array Computer Science-General Computer Science
CiteScore
4.40
自引率
0.00%
发文量
93
审稿时长
45 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信