{"title":"基于机器学习的激光增材制造原位监测视听跨模态知识转移","authors":"Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao","doi":"arxiv-2408.05307","DOIUrl":null,"url":null,"abstract":"Various machine learning (ML)-based in-situ monitoring systems have been\ndeveloped to detect laser additive manufacturing (LAM) process anomalies and\ndefects. Multimodal fusion can improve in-situ monitoring performance by\nacquiring and integrating data from multiple modalities, including visual and\naudio data. However, multimodal fusion employs multiple sensors of different\ntypes, which leads to higher hardware, computational, and operational costs.\nThis paper proposes a cross-modality knowledge transfer (CMKT) methodology that\ntransfers knowledge from a source to a target modality for LAM in-situ\nmonitoring. CMKT enhances the usefulness of the features extracted from the\ntarget modality during the training phase and removes the sensors of the source\nmodality during the prediction phase. This paper proposes three CMKT methods:\nsemantic alignment, fully supervised mapping, and semi-supervised mapping.\nSemantic alignment establishes a shared encoded space between modalities to\nfacilitate knowledge transfer. It utilizes a semantic alignment loss to align\nthe distributions of the same classes (e.g., visual defective and audio\ndefective classes) and a separation loss to separate the distributions of\ndifferent classes (e.g., visual defective and audio defect-free classes). The\ntwo mapping methods transfer knowledge by deriving the features of one modality\nfrom the other modality using fully supervised and semi-supervised learning.\nThe proposed CMKT methods were implemented and compared with multimodal\naudio-visual fusion in an LAM in-situ anomaly detection case study. The\nsemantic alignment method achieves a 98.4% accuracy while removing the audio\nmodality during the prediction phase, which is comparable to the accuracy of\nmultimodal fusion (98.2%).","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing\",\"authors\":\"Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao\",\"doi\":\"arxiv-2408.05307\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Various machine learning (ML)-based in-situ monitoring systems have been\\ndeveloped to detect laser additive manufacturing (LAM) process anomalies and\\ndefects. Multimodal fusion can improve in-situ monitoring performance by\\nacquiring and integrating data from multiple modalities, including visual and\\naudio data. However, multimodal fusion employs multiple sensors of different\\ntypes, which leads to higher hardware, computational, and operational costs.\\nThis paper proposes a cross-modality knowledge transfer (CMKT) methodology that\\ntransfers knowledge from a source to a target modality for LAM in-situ\\nmonitoring. CMKT enhances the usefulness of the features extracted from the\\ntarget modality during the training phase and removes the sensors of the source\\nmodality during the prediction phase. This paper proposes three CMKT methods:\\nsemantic alignment, fully supervised mapping, and semi-supervised mapping.\\nSemantic alignment establishes a shared encoded space between modalities to\\nfacilitate knowledge transfer. It utilizes a semantic alignment loss to align\\nthe distributions of the same classes (e.g., visual defective and audio\\ndefective classes) and a separation loss to separate the distributions of\\ndifferent classes (e.g., visual defective and audio defect-free classes). The\\ntwo mapping methods transfer knowledge by deriving the features of one modality\\nfrom the other modality using fully supervised and semi-supervised learning.\\nThe proposed CMKT methods were implemented and compared with multimodal\\naudio-visual fusion in an LAM in-situ anomaly detection case study. The\\nsemantic alignment method achieves a 98.4% accuracy while removing the audio\\nmodality during the prediction phase, which is comparable to the accuracy of\\nmultimodal fusion (98.2%).\",\"PeriodicalId\":501309,\"journal\":{\"name\":\"arXiv - CS - Computational Engineering, Finance, and Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computational Engineering, Finance, and Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.05307\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing
Various machine learning (ML)-based in-situ monitoring systems have been
developed to detect laser additive manufacturing (LAM) process anomalies and
defects. Multimodal fusion can improve in-situ monitoring performance by
acquiring and integrating data from multiple modalities, including visual and
audio data. However, multimodal fusion employs multiple sensors of different
types, which leads to higher hardware, computational, and operational costs.
This paper proposes a cross-modality knowledge transfer (CMKT) methodology that
transfers knowledge from a source to a target modality for LAM in-situ
monitoring. CMKT enhances the usefulness of the features extracted from the
target modality during the training phase and removes the sensors of the source
modality during the prediction phase. This paper proposes three CMKT methods:
semantic alignment, fully supervised mapping, and semi-supervised mapping.
Semantic alignment establishes a shared encoded space between modalities to
facilitate knowledge transfer. It utilizes a semantic alignment loss to align
the distributions of the same classes (e.g., visual defective and audio
defective classes) and a separation loss to separate the distributions of
different classes (e.g., visual defective and audio defect-free classes). The
two mapping methods transfer knowledge by deriving the features of one modality
from the other modality using fully supervised and semi-supervised learning.
The proposed CMKT methods were implemented and compared with multimodal
audio-visual fusion in an LAM in-situ anomaly detection case study. The
semantic alignment method achieves a 98.4% accuracy while removing the audio
modality during the prediction phase, which is comparable to the accuracy of
multimodal fusion (98.2%).