Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing

Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao
{"title":"Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing","authors":"Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao","doi":"arxiv-2408.05307","DOIUrl":null,"url":null,"abstract":"Various machine learning (ML)-based in-situ monitoring systems have been\ndeveloped to detect laser additive manufacturing (LAM) process anomalies and\ndefects. Multimodal fusion can improve in-situ monitoring performance by\nacquiring and integrating data from multiple modalities, including visual and\naudio data. However, multimodal fusion employs multiple sensors of different\ntypes, which leads to higher hardware, computational, and operational costs.\nThis paper proposes a cross-modality knowledge transfer (CMKT) methodology that\ntransfers knowledge from a source to a target modality for LAM in-situ\nmonitoring. CMKT enhances the usefulness of the features extracted from the\ntarget modality during the training phase and removes the sensors of the source\nmodality during the prediction phase. This paper proposes three CMKT methods:\nsemantic alignment, fully supervised mapping, and semi-supervised mapping.\nSemantic alignment establishes a shared encoded space between modalities to\nfacilitate knowledge transfer. It utilizes a semantic alignment loss to align\nthe distributions of the same classes (e.g., visual defective and audio\ndefective classes) and a separation loss to separate the distributions of\ndifferent classes (e.g., visual defective and audio defect-free classes). The\ntwo mapping methods transfer knowledge by deriving the features of one modality\nfrom the other modality using fully supervised and semi-supervised learning.\nThe proposed CMKT methods were implemented and compared with multimodal\naudio-visual fusion in an LAM in-situ anomaly detection case study. The\nsemantic alignment method achieves a 98.4% accuracy while removing the audio\nmodality during the prediction phase, which is comparable to the accuracy of\nmultimodal fusion (98.2%).","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Various machine learning (ML)-based in-situ monitoring systems have been developed to detect laser additive manufacturing (LAM) process anomalies and defects. Multimodal fusion can improve in-situ monitoring performance by acquiring and integrating data from multiple modalities, including visual and audio data. However, multimodal fusion employs multiple sensors of different types, which leads to higher hardware, computational, and operational costs. This paper proposes a cross-modality knowledge transfer (CMKT) methodology that transfers knowledge from a source to a target modality for LAM in-situ monitoring. CMKT enhances the usefulness of the features extracted from the target modality during the training phase and removes the sensors of the source modality during the prediction phase. This paper proposes three CMKT methods: semantic alignment, fully supervised mapping, and semi-supervised mapping. Semantic alignment establishes a shared encoded space between modalities to facilitate knowledge transfer. It utilizes a semantic alignment loss to align the distributions of the same classes (e.g., visual defective and audio defective classes) and a separation loss to separate the distributions of different classes (e.g., visual defective and audio defect-free classes). The two mapping methods transfer knowledge by deriving the features of one modality from the other modality using fully supervised and semi-supervised learning. The proposed CMKT methods were implemented and compared with multimodal audio-visual fusion in an LAM in-situ anomaly detection case study. The semantic alignment method achieves a 98.4% accuracy while removing the audio modality during the prediction phase, which is comparable to the accuracy of multimodal fusion (98.2%).
基于机器学习的激光增材制造原位监测视听跨模态知识转移
目前已开发出多种基于机器学习(ML)的原位监测系统,用于检测激光增材制造(LAM)过程的异常和缺陷。多模态融合可以通过获取和整合多种模态的数据(包括视觉和音频数据)来提高原位监测性能。本文提出了一种跨模态知识转移(CMKT)方法,将知识从源模态转移到目标模态,用于 LAM 现场监测。在训练阶段,CMKT 增强了从目标模态提取的特征的实用性,并在预测阶段消除了源模态的传感器。本文提出了三种 CMKT 方法:语义对齐、完全监督映射和半监督映射。它利用语义对齐损失来对齐相同类别(如视觉缺陷和听觉缺陷类别)的分布,并利用分离损失来分离不同类别(如视觉缺陷和听觉无缺陷类别)的分布。这两种映射方法通过使用完全监督和半监督学习从一种模态得出另一种模态的特征,从而实现知识转移。在预测阶段去除音频模式的同时,语义对齐方法的准确率达到了 98.4%,与多模式融合的准确率(98.2%)相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信