Domain Adaptation Neural Network for Acoustic Scene Classification in Mismatched Conditions

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI:10.1109/APSIPAASC47483.2019.9023057

Rui Wang, Mou Wang, Xiao-Lei Zhang, S. Rahardja

{"title":"Domain Adaptation Neural Network for Acoustic Scene Classification in Mismatched Conditions","authors":"Rui Wang, Mou Wang, Xiao-Lei Zhang, S. Rahardja","doi":"10.1109/APSIPAASC47483.2019.9023057","DOIUrl":null,"url":null,"abstract":"Acoustic scene classification is a task of predicting the acoustic environment of an audio recording. Because the training and test conditions in most real world acoustic scene classification problems do not match, it is strongly necessary to develop domain adaptation methods to solve the cross-domain problem. In this paper, we propose a domain adaptation neural network (DANN) based acoustic scene classification (ASC) method. Specifically, we first extract an acoustic feature, i.e. log-Mel spectrogram, which has been proven to be effective in previous studies. Then, we train a DANN to project the training and test domains into one common space where the acoustic scenes are categorized jointly. To boost the overall performance of the proposed method, we further train an ensemble of convolutional neural network (CNN) models with different parameter settings respectively. Finally, we fuse the DANN and CNN models by averaging the outputs of the models. We have evaluated the proposed method on the subtask B of task 1 of the DCASE 2019 ASC challenge, which is a closed-set classification problem whose audio recordings were recorded by mismatched devices. Experimental results demonstrate the effectiveness of the proposed method on the acoustic scene classification problem in mismatched conditions.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPAASC47483.2019.9023057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Acoustic scene classification is a task of predicting the acoustic environment of an audio recording. Because the training and test conditions in most real world acoustic scene classification problems do not match, it is strongly necessary to develop domain adaptation methods to solve the cross-domain problem. In this paper, we propose a domain adaptation neural network (DANN) based acoustic scene classification (ASC) method. Specifically, we first extract an acoustic feature, i.e. log-Mel spectrogram, which has been proven to be effective in previous studies. Then, we train a DANN to project the training and test domains into one common space where the acoustic scenes are categorized jointly. To boost the overall performance of the proposed method, we further train an ensemble of convolutional neural network (CNN) models with different parameter settings respectively. Finally, we fuse the DANN and CNN models by averaging the outputs of the models. We have evaluated the proposed method on the subtask B of task 1 of the DCASE 2019 ASC challenge, which is a closed-set classification problem whose audio recordings were recorded by mismatched devices. Experimental results demonstrate the effectiveness of the proposed method on the acoustic scene classification problem in mismatched conditions.

查看原文本刊更多论文

不匹配条件下声学场景分类的领域自适应神经网络

声场景分类是一项预测录音声环境的任务。由于现实世界中大多数声场景分类问题的训练和测试条件并不匹配，因此开发领域自适应方法来解决跨领域问题是非常必要的。本文提出了一种基于领域自适应神经网络(DANN)的声学场景分类方法。具体来说，我们首先提取一个声学特征，即log-Mel谱图，这在之前的研究中已经被证明是有效的。然后，我们训练一个DANN，将训练域和测试域投影到一个共同的空间中，在这个空间中声学场景被联合分类。为了提高所提方法的整体性能，我们进一步训练了不同参数设置的卷积神经网络(CNN)模型集合。最后，我们通过平均模型的输出来融合DANN和CNN模型。我们在DCASE 2019 ASC挑战任务1的子任务B上评估了所提出的方法，该任务是一个封闭集分类问题，其音频记录是由不匹配的设备记录的。实验结果表明，该方法在不匹配条件下的声场景分类问题上是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量