Leveraging deep neural networks with nonnegative representations for improved environmental sound classification

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) Pub Date : 2017-09-01 DOI:10.1109/MLSP.2017.8168139

Victor Bisot, R. Serizel, S. Essid, G. Richard

{"title":"Leveraging deep neural networks with nonnegative representations for improved environmental sound classification","authors":"Victor Bisot, R. Serizel, S. Essid, G. Richard","doi":"10.1109/MLSP.2017.8168139","DOIUrl":null,"url":null,"abstract":"This paper introduces the use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before training deep networks, whose usefulness is highlighted in this paper, especially for multi-source acoustic environments such as sound scenes. We rely on two established unsupervised and supervised NMF techniques to learn better input representations for deep neural networks. This will allow us, with simple architectures, to reach competitive performance with more complex systems such as convolutional networks for acoustic scene classification. The proposed systems outperform neural networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"67 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2017.8168139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

This paper introduces the use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before training deep networks, whose usefulness is highlighted in this paper, especially for multi-source acoustic environments such as sound scenes. We rely on two established unsupervised and supervised NMF techniques to learn better input representations for deep neural networks. This will allow us, with simple architectures, to reach competitive performance with more complex systems such as convolutional networks for acoustic scene classification. The proposed systems outperform neural networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.

查看原文本刊更多论文

利用非负表示的深度神经网络改进环境声音分类

本文介绍了利用基于非负矩阵分解(NMF)的表征方法训练深度神经网络，并将其应用于环境声音分类。用于声音分类的深度学习系统通常依赖于网络从频谱图或手工特征中学习有意义的表示。相反，我们在训练深度网络之前引入了一个基于nmf的特征学习阶段，本文强调了其实用性，特别是对于多源声环境(如声音场景)。我们依靠两种已建立的无监督和有监督NMF技术来学习深度神经网络的更好的输入表示。这将使我们能够使用简单的架构，与更复杂的系统(如用于声学场景分类的卷积网络)达到竞争性能。所提出的系统在两个声学场景分类数据集上的表现优于经过时频表示训练的神经网络，以及2016年DCASE挑战赛中的最佳系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

自引率

0.00%

发文量