3D-MMFN: Multi-level multimodal fusion network for 3D industrial image anomaly detection

IF 8 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Advanced Engineering Informatics Pub Date : 2025-03-31 DOI:10.1016/j.aei.2025.103284

Mujtaba Asad , Waqar Azeem , Aftab Ahmad Malik , He Jiang , Ahmad Ali , Jie Yang , Wei Liu

{"title":"3D-MMFN: Multi-level multimodal fusion network for 3D industrial image anomaly detection","authors":"Mujtaba Asad , Waqar Azeem , Aftab Ahmad Malik , He Jiang , Ahmad Ali , Jie Yang , Wei Liu","doi":"10.1016/j.aei.2025.103284","DOIUrl":null,"url":null,"abstract":"<div><div>3D-based image anomaly detection (AD) is a crucial computer vision task in industrial manufacturing. Most existing methods only focus on 2D shape-based detections. However, there is still limited research for detecting anomalies in 3D shapes using multimodal features. Some existing techniques developed for this task are mostly unsuitable for industrial defect detection for several reasons. Firstly, they rely mostly on memory banks, resulting in high storage overheads, making them difficult to deploy on production lines. Secondly, the multimodal features, in the existing 3D industrial AD algorithms, are concatenated directly which cause a significant disruption between the features and degrades the detection efficiency. Thirdly, their inference speed is not fast enough to achieve real-time detection. To address these challenges, we propose a deployment-friendly network named 3D-MMFN. Our model comprises of the following components: (1) The pre-trained feature extractors that generate local features from multi-stream inputs of RGB images, surface normal maps, and point clouds. (2) A novel point-to-pixel based fusion module that efficiently fuses multi-level multimodal features to mitigate disruption during the fusion operation. (3) An anomaly generator module that generates anomalous features from normal multimodal fused features, enabling self-supervised training of 3D-MMFN while eliminating the need for extensive memory banks. Experimental results on the MVTec3D-AD and Eyecandies dataset demonstrate the effectiveness of our proposed model, showcasing significant performance improvements over state-of-the-art methods.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"65 ","pages":"Article 103284"},"PeriodicalIF":8.0000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625001776","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

3D-based image anomaly detection (AD) is a crucial computer vision task in industrial manufacturing. Most existing methods only focus on 2D shape-based detections. However, there is still limited research for detecting anomalies in 3D shapes using multimodal features. Some existing techniques developed for this task are mostly unsuitable for industrial defect detection for several reasons. Firstly, they rely mostly on memory banks, resulting in high storage overheads, making them difficult to deploy on production lines. Secondly, the multimodal features, in the existing 3D industrial AD algorithms, are concatenated directly which cause a significant disruption between the features and degrades the detection efficiency. Thirdly, their inference speed is not fast enough to achieve real-time detection. To address these challenges, we propose a deployment-friendly network named 3D-MMFN. Our model comprises of the following components: (1) The pre-trained feature extractors that generate local features from multi-stream inputs of RGB images, surface normal maps, and point clouds. (2) A novel point-to-pixel based fusion module that efficiently fuses multi-level multimodal features to mitigate disruption during the fusion operation. (3) An anomaly generator module that generates anomalous features from normal multimodal fused features, enabling self-supervised training of 3D-MMFN while eliminating the need for extensive memory banks. Experimental results on the MVTec3D-AD and Eyecandies dataset demonstrate the effectiveness of our proposed model, showcasing significant performance improvements over state-of-the-art methods.

查看原文本刊更多论文

3D- mmfn：用于三维工业图像异常检测的多级多模态融合网络

三维图像异常检测是工业制造领域一项重要的计算机视觉任务。大多数现有方法只关注基于二维形状的检测。然而，利用多模态特征检测三维形状异常的研究仍然有限。由于几个原因，为这项任务开发的一些现有技术大多不适合工业缺陷检测。首先，它们主要依赖于内存库，导致存储开销高，难以在生产线上部署。其次，在现有的3D工业AD算法中，多模态特征是直接连接在一起的，这导致特征之间的严重中断，降低了检测效率。第三，它们的推理速度不够快，无法实现实时检测。为了应对这些挑战，我们提出了一个名为3D-MMFN的部署友好型网络。我们的模型由以下部分组成：(1)预训练的特征提取器，它从RGB图像、表面法线图和点云的多流输入中生成局部特征。(2)一种基于点到像素的融合模块，能够有效融合多层次多模态特征，减少融合过程中的干扰。(3)异常生成模块，从正常的多模态融合特征中生成异常特征，实现3D-MMFN的自监督训练，同时消除了对大量内存库的需求。在MVTec3D-AD和Eyecandies数据集上的实验结果证明了我们提出的模型的有效性，与最先进的方法相比，显示出显著的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advanced Engineering Informatics 工程技术-工程：综合

CiteScore

12.40

自引率

18.20%

发文量

292

审稿时长

45 days

期刊介绍： Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.