Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing

arXiv - CS - Computational Engineering, Finance, and Science Pub Date : 2024-08-09 DOI:arxiv-2408.05307

Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao

{"title":"Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing","authors":"Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao","doi":"arxiv-2408.05307","DOIUrl":null,"url":null,"abstract":"Various machine learning (ML)-based in-situ monitoring systems have been\ndeveloped to detect laser additive manufacturing (LAM) process anomalies and\ndefects. Multimodal fusion can improve in-situ monitoring performance by\nacquiring and integrating data from multiple modalities, including visual and\naudio data. However, multimodal fusion employs multiple sensors of different\ntypes, which leads to higher hardware, computational, and operational costs.\nThis paper proposes a cross-modality knowledge transfer (CMKT) methodology that\ntransfers knowledge from a source to a target modality for LAM in-situ\nmonitoring. CMKT enhances the usefulness of the features extracted from the\ntarget modality during the training phase and removes the sensors of the source\nmodality during the prediction phase. This paper proposes three CMKT methods:\nsemantic alignment, fully supervised mapping, and semi-supervised mapping.\nSemantic alignment establishes a shared encoded space between modalities to\nfacilitate knowledge transfer. It utilizes a semantic alignment loss to align\nthe distributions of the same classes (e.g., visual defective and audio\ndefective classes) and a separation loss to separate the distributions of\ndifferent classes (e.g., visual defective and audio defect-free classes). The\ntwo mapping methods transfer knowledge by deriving the features of one modality\nfrom the other modality using fully supervised and semi-supervised learning.\nThe proposed CMKT methods were implemented and compared with multimodal\naudio-visual fusion in an LAM in-situ anomaly detection case study. The\nsemantic alignment method achieves a 98.4% accuracy while removing the audio\nmodality during the prediction phase, which is comparable to the accuracy of\nmultimodal fusion (98.2%).","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Various machine learning (ML)-based in-situ monitoring systems have been developed to detect laser additive manufacturing (LAM) process anomalies and defects. Multimodal fusion can improve in-situ monitoring performance by acquiring and integrating data from multiple modalities, including visual and audio data. However, multimodal fusion employs multiple sensors of different types, which leads to higher hardware, computational, and operational costs. This paper proposes a cross-modality knowledge transfer (CMKT) methodology that transfers knowledge from a source to a target modality for LAM in-situ monitoring. CMKT enhances the usefulness of the features extracted from the target modality during the training phase and removes the sensors of the source modality during the prediction phase. This paper proposes three CMKT methods: semantic alignment, fully supervised mapping, and semi-supervised mapping. Semantic alignment establishes a shared encoded space between modalities to facilitate knowledge transfer. It utilizes a semantic alignment loss to align the distributions of the same classes (e.g., visual defective and audio defective classes) and a separation loss to separate the distributions of different classes (e.g., visual defective and audio defect-free classes). The two mapping methods transfer knowledge by deriving the features of one modality from the other modality using fully supervised and semi-supervised learning. The proposed CMKT methods were implemented and compared with multimodal audio-visual fusion in an LAM in-situ anomaly detection case study. The semantic alignment method achieves a 98.4% accuracy while removing the audio modality during the prediction phase, which is comparable to the accuracy of multimodal fusion (98.2%).

查看原文本刊更多论文

基于机器学习的激光增材制造原位监测视听跨模态知识转移

目前已开发出多种基于机器学习（ML）的原位监测系统，用于检测激光增材制造（LAM）过程的异常和缺陷。多模态融合可以通过获取和整合多种模态的数据（包括视觉和音频数据）来提高原位监测性能。本文提出了一种跨模态知识转移（CMKT）方法，将知识从源模态转移到目标模态，用于 LAM 现场监测。在训练阶段，CMKT 增强了从目标模态提取的特征的实用性，并在预测阶段消除了源模态的传感器。本文提出了三种 CMKT 方法：语义对齐、完全监督映射和半监督映射。它利用语义对齐损失来对齐相同类别（如视觉缺陷和听觉缺陷类别）的分布，并利用分离损失来分离不同类别（如视觉缺陷和听觉无缺陷类别）的分布。这两种映射方法通过使用完全监督和半监督学习从一种模态得出另一种模态的特征，从而实现知识转移。在预测阶段去除音频模式的同时，语义对齐方法的准确率达到了 98.4%，与多模式融合的准确率（98.2%）相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Computational Engineering, Finance, and Science

自引率

0.00%

发文量