Hazardous Sound Detection Based on Audio Augmentation

2021 International Symposium on Electrical, Electronics and Information Engineering Pub Date : 2021-02-19 DOI:10.1145/3459104.3459174

Jincheng Zhang, Baojun Wang, W. Shi, Jucai Lin, Jun Yin

{"title":"Hazardous Sound Detection Based on Audio Augmentation","authors":"Jincheng Zhang, Baojun Wang, W. Shi, Jucai Lin, Jun Yin","doi":"10.1145/3459104.3459174","DOIUrl":null,"url":null,"abstract":"The aim of surveillance is to detect the occurrence of dangerous events. Recently, with the widely use of deep learning, video surveillance had get dramatically improvement. For audio event detection in surveillance, the deep learning means are applied in hazardous sound classification task. However, due to the low frequency of dangerous sounds occurred and the high cost of collection, there is no corresponding large-scale dataset. Large-scale dataset is essential to achieve an ideal result for deep learning methods. Therefore, how to obtain richer audio events has become an urgent problem. Nowadays, researchers have use a variety of data augmentation methods in computer vision, making performance improvement obviously. And these approaches are gradually being used in various sound pattern recognition or ASR (auto-speech recognition), but there is little research on the classification of hazardous sounds with less data set. In this paper, various data augmentation methods are adopted for hazardous sound classification. Our results show that data augmentation has bring big improvement on all four class dataset. The classification accuracy has increased by 0.5% on average. As the scale of data augmentation increases, the classification accuracy has increased to about 1.5%.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Symposium on Electrical, Electronics and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459104.3459174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The aim of surveillance is to detect the occurrence of dangerous events. Recently, with the widely use of deep learning, video surveillance had get dramatically improvement. For audio event detection in surveillance, the deep learning means are applied in hazardous sound classification task. However, due to the low frequency of dangerous sounds occurred and the high cost of collection, there is no corresponding large-scale dataset. Large-scale dataset is essential to achieve an ideal result for deep learning methods. Therefore, how to obtain richer audio events has become an urgent problem. Nowadays, researchers have use a variety of data augmentation methods in computer vision, making performance improvement obviously. And these approaches are gradually being used in various sound pattern recognition or ASR (auto-speech recognition), but there is little research on the classification of hazardous sounds with less data set. In this paper, various data augmentation methods are adopted for hazardous sound classification. Our results show that data augmentation has bring big improvement on all four class dataset. The classification accuracy has increased by 0.5% on average. As the scale of data augmentation increases, the classification accuracy has increased to about 1.5%.

查看原文本刊更多论文

基于音频增强的危险声音检测

监视的目的是发现危险事件的发生。近年来，随着深度学习技术的广泛应用，视频监控得到了极大的改善。针对监控中的音频事件检测，将深度学习方法应用于危险声音分类任务。然而，由于危险声音发生的频率低，收集成本高，没有相应的大规模数据集。对于深度学习方法来说，要获得理想的结果，大规模数据集是必不可少的。因此，如何获取更丰富的音频事件已成为一个亟待解决的问题。目前，研究人员在计算机视觉中使用了各种各样的数据增强方法，使性能得到了明显的提高。这些方法已逐渐应用于各种声音模式识别或自动语音识别中，但在数据集较少的情况下，对有害声音的分类研究较少。本文采用了多种数据增强方法对危险声音进行分类。我们的结果表明，数据增强对所有四类数据集都带来了很大的改善。分类精度平均提高了0.5%。随着数据扩充规模的增加，分类准确率提高到1.5%左右。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Symposium on Electrical, Electronics and Information Engineering

自引率

0.00%

发文量