无监督跨域声学异常检测的自适应谱图变换

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI:10.23919/APSIPAASC55919.2022.9980266

Gilles Van De Vyver, Zhaoyi Liu, Koustabh Dolui, D. Hughes, Sam Michiels

{"title":"无监督跨域声学异常检测的自适应谱图变换","authors":"Gilles Van De Vyver, Zhaoyi Liu, Koustabh Dolui, D. Hughes, Sam Michiels","doi":"10.23919/APSIPAASC55919.2022.9980266","DOIUrl":null,"url":null,"abstract":"Anomaly detection models can help to automatically and proactively detect faults in industrial machines. Microphones are appealing as they are generally inexpensive and unlike visual inspection, recording sound samples can give information about the internals of the machine. However, conventional methods based on an AutoEncoder (AE) structure learned from scratch generally struggle to learn how to robustly reconstruct samples with limited available data. This paper addresses this problem by presenting a method for unsupervised Acoustic Anomaly Detection (AAD) that adapts intermediate embeddings from a pretrained, self-attention-based spectrogram transformer. Transfer learning from a large, successful model offers a solution to learning with limited data by reusing external knowledge. For AAD, this can help to recognize subtle anomalies. This work proposes two method variants that take advantage of Intermediate Feature Embeddings (IFEs) from the Audio Spectrogram Transformer (AST). The first fits a Gaussian Mixture Model (GMM) on the IFEs produced by intermediate layers of the AST. We call this ADIFAST: Anomaly Detection from Intermediate Features extracted from AST. The second uses the IFEs in a different, more effective way by adapting the AST to an AE structure. We call it TELD: Transformer Encoder Linear Decoder network. The relationship between the two method variants is that they both take advantage of the IFEs extracted by the AST. Evaluating TELD on task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge gives an average improvement to the Area Under Curve (AUC) score of 3.9% for binary labeling normal and anomalous samples in the target domain.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adapted Spectrogram Transformer for Unsupervised Cross-Domain Acoustic Anomaly Detection\",\"authors\":\"Gilles Van De Vyver, Zhaoyi Liu, Koustabh Dolui, D. Hughes, Sam Michiels\",\"doi\":\"10.23919/APSIPAASC55919.2022.9980266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Anomaly detection models can help to automatically and proactively detect faults in industrial machines. Microphones are appealing as they are generally inexpensive and unlike visual inspection, recording sound samples can give information about the internals of the machine. However, conventional methods based on an AutoEncoder (AE) structure learned from scratch generally struggle to learn how to robustly reconstruct samples with limited available data. This paper addresses this problem by presenting a method for unsupervised Acoustic Anomaly Detection (AAD) that adapts intermediate embeddings from a pretrained, self-attention-based spectrogram transformer. Transfer learning from a large, successful model offers a solution to learning with limited data by reusing external knowledge. For AAD, this can help to recognize subtle anomalies. This work proposes two method variants that take advantage of Intermediate Feature Embeddings (IFEs) from the Audio Spectrogram Transformer (AST). The first fits a Gaussian Mixture Model (GMM) on the IFEs produced by intermediate layers of the AST. We call this ADIFAST: Anomaly Detection from Intermediate Features extracted from AST. The second uses the IFEs in a different, more effective way by adapting the AST to an AE structure. We call it TELD: Transformer Encoder Linear Decoder network. The relationship between the two method variants is that they both take advantage of the IFEs extracted by the AST. Evaluating TELD on task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge gives an average improvement to the Area Under Curve (AUC) score of 3.9% for binary labeling normal and anomalous samples in the target domain.\",\"PeriodicalId\":382967,\"journal\":{\"name\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPAASC55919.2022.9980266\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9980266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

异常检测模型可以帮助自动和主动检测工业机器中的故障。麦克风很有吸引力，因为它们通常不贵，而且与目测不同，录制声音样本可以提供有关机器内部的信息。然而，基于从头开始学习的AutoEncoder (AE)结构的传统方法通常难以学习如何在有限的可用数据下鲁棒地重建样本。本文通过提出一种无监督声学异常检测(AAD)方法来解决这个问题，该方法适应来自预训练的、基于自注意的频谱图转换器的中间嵌入。从一个成功的大型模型中进行迁移学习，可以通过重用外部知识来解决使用有限数据进行学习的问题。对于AAD，这可以帮助识别细微的异常。本研究提出了两种利用音频频谱变换(AST)的中间特征嵌入(IFEs)的方法变体。第一种方法将高斯混合模型(GMM)拟合到由AST的中间层产生的IFEs上。我们称之为ADIFAST:从AST提取的中间特征中进行异常检测。第二种方法以一种不同的、更有效的方式使用IFEs，使AST适应于AE结构。我们称之为TELD:变压器编码器线性解码器网络。这两种方法变体之间的关系是，它们都利用了AST提取的ife。在声学场景和事件检测与分类(DCASE) 2021挑战的任务2中评估TELD，对于目标域中正常和异常样本的二元标记，曲线下面积(AUC)得分平均提高了3.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adapted Spectrogram Transformer for Unsupervised Cross-Domain Acoustic Anomaly Detection

Anomaly detection models can help to automatically and proactively detect faults in industrial machines. Microphones are appealing as they are generally inexpensive and unlike visual inspection, recording sound samples can give information about the internals of the machine. However, conventional methods based on an AutoEncoder (AE) structure learned from scratch generally struggle to learn how to robustly reconstruct samples with limited available data. This paper addresses this problem by presenting a method for unsupervised Acoustic Anomaly Detection (AAD) that adapts intermediate embeddings from a pretrained, self-attention-based spectrogram transformer. Transfer learning from a large, successful model offers a solution to learning with limited data by reusing external knowledge. For AAD, this can help to recognize subtle anomalies. This work proposes two method variants that take advantage of Intermediate Feature Embeddings (IFEs) from the Audio Spectrogram Transformer (AST). The first fits a Gaussian Mixture Model (GMM) on the IFEs produced by intermediate layers of the AST. We call this ADIFAST: Anomaly Detection from Intermediate Features extracted from AST. The second uses the IFEs in a different, more effective way by adapting the AST to an AE structure. We call it TELD: Transformer Encoder Linear Decoder network. The relationship between the two method variants is that they both take advantage of the IFEs extracted by the AST. Evaluating TELD on task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge gives an average improvement to the Area Under Curve (AUC) score of 3.9% for binary labeling normal and anomalous samples in the target domain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量