A Re-trained Model Based On Multi-kernel Convolutional Neural Network for Acoustic Scene Classification

2020 RIVF International Conference on Computing and Communication Technologies (RIVF) Pub Date : 2020-10-01 DOI:10.1109/RIVF48685.2020.9140774

Tuan N. Nguyen, D. Ngo, L. Pham, Linh Tran, Trang Hoang

引用次数: 2

Abstract

This paper proposes a deep learning framework applied for Acoustic Scene Classification (ASC), which identifies recording location. In general, we apply three types of spectrograms: Gammatone (GAM), log-Mel and Constant Q Transform (CQT) for front-end feature extraction. For back-end classification, we present a re-trained model with a multi-kernel CDNN-based architecture for the pre-trained process and a DNN-based network for the post-trained process. Our obtained results over DCASE 2016 dataset show a significant improvement, increasing by nearly 8% compared to DCASE baseline of 77.2%.

查看原文本刊更多论文

基于多核卷积神经网络的声学场景分类再训练模型

本文提出了一种应用于声学场景分类(ASC)的深度学习框架，用于识别录音位置。一般来说，我们应用三种类型的谱图:伽玛酮(GAM)，对数-梅尔和常数Q变换(CQT)进行前端特征提取。对于后端分类，我们提出了一个重新训练的模型，其中基于多核cdn的体系结构用于预训练过程，基于dnn的网络用于后训练过程。我们在DCASE 2016数据集上获得的结果显示出显着的改善，与DCASE基线的77.2%相比，提高了近8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 RIVF International Conference on Computing and Communication Technologies (RIVF)

自引率

0.00%

发文量