{"title":"Self-Subtraction Network for End to End Noise Robust Classification","authors":"Donghyeon Kim, D. Han, Hanseok Ko","doi":"10.1109/AVSS.2019.8909821","DOIUrl":null,"url":null,"abstract":"Acoustic event classification in surveillance applications typically employs deep learning-based end-to-end learning methods. In real environments, their performance degrades significantly due to noise. While various approaches have been proposed to overcome the noise problem, most of these methodologies rely on supervised learning-based feature representation. Supervised learning system, however, requires a pair of noise free and noisy audio streams. Acquisition of ground truth and noisy acoustic event data requires significant efforts to adequately capture the varieties of noise types for training. This paper proposes a novel supervised learning method for noise robust acoustic event classification in an end-to-end framework named Self Subtraction Network (SSN). SSN extracts noise features from an input audio spectrogram and removes them from the input using LSTMs and an auto-encoder. Our method applied to Urbansound8k dataset with 8 noise types at four different levels demonstrates improved performances compared to the state of the art methods.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS.2019.8909821","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Acoustic event classification in surveillance applications typically employs deep learning-based end-to-end learning methods. In real environments, their performance degrades significantly due to noise. While various approaches have been proposed to overcome the noise problem, most of these methodologies rely on supervised learning-based feature representation. Supervised learning system, however, requires a pair of noise free and noisy audio streams. Acquisition of ground truth and noisy acoustic event data requires significant efforts to adequately capture the varieties of noise types for training. This paper proposes a novel supervised learning method for noise robust acoustic event classification in an end-to-end framework named Self Subtraction Network (SSN). SSN extracts noise features from an input audio spectrogram and removes them from the input using LSTMs and an auto-encoder. Our method applied to Urbansound8k dataset with 8 noise types at four different levels demonstrates improved performances compared to the state of the art methods.