{"title":"通过预训练音频模型的低库适应性微调改进异常声音检测","authors":"Xinhu Zheng, Anbai Jiang, Bing Han, Yanmin Qian, Pingyi Fan, Jia Liu, Wei-Qiang Zhang","doi":"arxiv-2409.07016","DOIUrl":null,"url":null,"abstract":"Anomalous Sound Detection (ASD) has gained significant interest through the\napplication of various Artificial Intelligence (AI) technologies in industrial\nsettings. Though possessing great potential, ASD systems can hardly be readily\ndeployed in real production sites due to the generalization problem, which is\nprimarily caused by the difficulty of data collection and the complexity of\nenvironmental factors. This paper introduces a robust ASD model that leverages\naudio pre-trained models. Specifically, we fine-tune these models using machine\noperation data, employing SpecAug as a data augmentation strategy.\nAdditionally, we investigate the impact of utilizing Low-Rank Adaptation (LoRA)\ntuning instead of full fine-tuning to address the problem of limited data for\nfine-tuning. Our experiments on the DCASE2023 Task 2 dataset establish a new\nbenchmark of 77.75% on the evaluation set, with a significant improvement of\n6.48% compared with previous state-of-the-art (SOTA) models, including top-tier\ntraditional convolutional networks and speech pre-trained models, which\ndemonstrates the effectiveness of audio pre-trained models with LoRA tuning.\nAblation studies are also conducted to showcase the efficacy of the proposed\nscheme.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models\",\"authors\":\"Xinhu Zheng, Anbai Jiang, Bing Han, Yanmin Qian, Pingyi Fan, Jia Liu, Wei-Qiang Zhang\",\"doi\":\"arxiv-2409.07016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Anomalous Sound Detection (ASD) has gained significant interest through the\\napplication of various Artificial Intelligence (AI) technologies in industrial\\nsettings. Though possessing great potential, ASD systems can hardly be readily\\ndeployed in real production sites due to the generalization problem, which is\\nprimarily caused by the difficulty of data collection and the complexity of\\nenvironmental factors. This paper introduces a robust ASD model that leverages\\naudio pre-trained models. Specifically, we fine-tune these models using machine\\noperation data, employing SpecAug as a data augmentation strategy.\\nAdditionally, we investigate the impact of utilizing Low-Rank Adaptation (LoRA)\\ntuning instead of full fine-tuning to address the problem of limited data for\\nfine-tuning. Our experiments on the DCASE2023 Task 2 dataset establish a new\\nbenchmark of 77.75% on the evaluation set, with a significant improvement of\\n6.48% compared with previous state-of-the-art (SOTA) models, including top-tier\\ntraditional convolutional networks and speech pre-trained models, which\\ndemonstrates the effectiveness of audio pre-trained models with LoRA tuning.\\nAblation studies are also conducted to showcase the efficacy of the proposed\\nscheme.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
Anomalous Sound Detection (ASD) has gained significant interest through the
application of various Artificial Intelligence (AI) technologies in industrial
settings. Though possessing great potential, ASD systems can hardly be readily
deployed in real production sites due to the generalization problem, which is
primarily caused by the difficulty of data collection and the complexity of
environmental factors. This paper introduces a robust ASD model that leverages
audio pre-trained models. Specifically, we fine-tune these models using machine
operation data, employing SpecAug as a data augmentation strategy.
Additionally, we investigate the impact of utilizing Low-Rank Adaptation (LoRA)
tuning instead of full fine-tuning to address the problem of limited data for
fine-tuning. Our experiments on the DCASE2023 Task 2 dataset establish a new
benchmark of 77.75% on the evaluation set, with a significant improvement of
6.48% compared with previous state-of-the-art (SOTA) models, including top-tier
traditional convolutional networks and speech pre-trained models, which
demonstrates the effectiveness of audio pre-trained models with LoRA tuning.
Ablation studies are also conducted to showcase the efficacy of the proposed
scheme.