Classification of Short Audio Acoustic Scenes Based on Data Augmentation Methods

Xuan Zhang, Yunfei Shao, Jun-Xiang Xu, Yong Ma, Wei-Qiang Zhang
{"title":"Classification of Short Audio Acoustic Scenes Based on Data Augmentation Methods","authors":"Xuan Zhang, Yunfei Shao, Jun-Xiang Xu, Yong Ma, Wei-Qiang Zhang","doi":"10.23919/APSIPAASC55919.2022.9980120","DOIUrl":null,"url":null,"abstract":"How to effectively classify short audio data into acoustic scenes is a new challenge proposed by task 1 of the DCASE2022 challenge. This paper details the exploration we made for this problem and the architecture we used. Our architecture is based on Segnet, adding an instance normalization layer to normalize the activations of the previous layer at conv_block 1 of encoder and deconv_block 2 of decoder. Log-mel spectrograms, delta features, and delta-delta features were extracted to train the acoustic scene classification model. A total of 6 data augmentation methods were applied as follows: mixup, time and frequency domain masking, image augmentation, auto level, pix2pix, and random crop. We applied three model compression schemes: pruning, quantization, and knowledge distillation to reduce model complexity. The proposed system achieved higher classification accuracy than the baseline system. Our model can achieve an average accuracy of 60.58% when tested on the test set of TAU Urban Acoustic Scenes 2022 Mobile, development dataset. After model compression, our model achieved an average accuracy of 54.11% within the 127.2 K parameters size, 8-bit quantization, and MMACs less than 30 M.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9980120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

How to effectively classify short audio data into acoustic scenes is a new challenge proposed by task 1 of the DCASE2022 challenge. This paper details the exploration we made for this problem and the architecture we used. Our architecture is based on Segnet, adding an instance normalization layer to normalize the activations of the previous layer at conv_block 1 of encoder and deconv_block 2 of decoder. Log-mel spectrograms, delta features, and delta-delta features were extracted to train the acoustic scene classification model. A total of 6 data augmentation methods were applied as follows: mixup, time and frequency domain masking, image augmentation, auto level, pix2pix, and random crop. We applied three model compression schemes: pruning, quantization, and knowledge distillation to reduce model complexity. The proposed system achieved higher classification accuracy than the baseline system. Our model can achieve an average accuracy of 60.58% when tested on the test set of TAU Urban Acoustic Scenes 2022 Mobile, development dataset. After model compression, our model achieved an average accuracy of 54.11% within the 127.2 K parameters size, 8-bit quantization, and MMACs less than 30 M.
基于数据增强方法的短声声场景分类
如何有效地将短音频数据分类为声学场景是DCASE2022挑战任务1提出的新挑战。本文详细介绍了我们为这个问题所做的探索以及我们使用的体系结构。我们的架构基于分段网,增加了一个实例规范化层来规范前一层在编码器的conv_block 1和解码器的deconv_block 2处的激活。提取对数谱图、delta特征和delta-delta特征来训练声学场景分类模型。共采用了混合、时频域掩蔽、图像增强、自动调平、pix2pix、随机裁剪6种数据增强方法。我们采用了三种模型压缩方案:剪枝、量化和知识蒸馏来降低模型的复杂度。与基线系统相比,该系统具有更高的分类精度。在TAU城市声学场景2022移动开发数据集的测试集上,我们的模型可以达到60.58%的平均准确率。经过模型压缩,我们的模型在参数大小为127.2 K、量化为8位、mmac小于30 M的情况下,平均准确率达到54.11%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信