Acoustic Event Detection with MobileNet and 1D-Convolutional Neural Network

2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET) Pub Date : 2020-09-26 DOI:10.1109/IICAIET49801.2020.9257865

Pooi Shiang Tan, K. Lim, C. Lee, C. Tan

{"title":"Acoustic Event Detection with MobileNet and 1D-Convolutional Neural Network","authors":"Pooi Shiang Tan, K. Lim, C. Lee, C. Tan","doi":"10.1109/IICAIET49801.2020.9257865","DOIUrl":null,"url":null,"abstract":"Sound waves are a form of energy produced by a vibrating object that travels through the medium that can be heard. Generally, the sound is used in human communication, music, alert, and so on. Furthermore, it also helps us to understand what are the events that occurring in the moment, and thereby, provide us hints to understand what is happening around us. This has prompt researchers to study on how humans understand what event is occurring based on the sound waves. In recent years, researchers also study on how to equip the machine with this ability, i.e. acoustic event detection. This study focuses on the acoustic event detection which leverage both frequency spectrogram technique and deep learning methods. Initially, a spectrogram image is generated from the acoustic data by using the frequency spectrogram technique. Then, the generated frequency spectrogram is fed into a pre-trained MobileNet model to extract robust features representations. In this work, 1 Dimensional Convolutional Neural Network (1D-CNN) is adopted to train a model for acoustic event detection. The feature representations are extracted from a pre-trained MobileNet. The proposed 1D-CNN consist of several alternatives of convolution and pooling layers. The last pooling layer is flattened and fed into a fully connected layer to classify the events. Dropout is employed to prevent overfitting. The proposed frequency spectrogram with pre-trained MobileNet and 1D-CNN is then evaluated with three datasets, which are Soundscapes1, Soundscapes2, and UrbanSound8k. From the experimental results, the proposed method obtained 81, 86, and 70 F1-score, for Soundscapes1, Soundscapes2, and UrbanSound8k, respectively.","PeriodicalId":300885,"journal":{"name":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET49801.2020.9257865","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Sound waves are a form of energy produced by a vibrating object that travels through the medium that can be heard. Generally, the sound is used in human communication, music, alert, and so on. Furthermore, it also helps us to understand what are the events that occurring in the moment, and thereby, provide us hints to understand what is happening around us. This has prompt researchers to study on how humans understand what event is occurring based on the sound waves. In recent years, researchers also study on how to equip the machine with this ability, i.e. acoustic event detection. This study focuses on the acoustic event detection which leverage both frequency spectrogram technique and deep learning methods. Initially, a spectrogram image is generated from the acoustic data by using the frequency spectrogram technique. Then, the generated frequency spectrogram is fed into a pre-trained MobileNet model to extract robust features representations. In this work, 1 Dimensional Convolutional Neural Network (1D-CNN) is adopted to train a model for acoustic event detection. The feature representations are extracted from a pre-trained MobileNet. The proposed 1D-CNN consist of several alternatives of convolution and pooling layers. The last pooling layer is flattened and fed into a fully connected layer to classify the events. Dropout is employed to prevent overfitting. The proposed frequency spectrogram with pre-trained MobileNet and 1D-CNN is then evaluated with three datasets, which are Soundscapes1, Soundscapes2, and UrbanSound8k. From the experimental results, the proposed method obtained 81, 86, and 70 F1-score, for Soundscapes1, Soundscapes2, and UrbanSound8k, respectively.

查看原文本刊更多论文

基于MobileNet和一维卷积神经网络的声事件检测

声波是一种能量形式，是由振动的物体在可听到的介质中传播时产生的。一般来说，声音用于人类的交流、音乐、警报等。此外，它还帮助我们理解当下发生的事件，从而为我们提供理解周围发生的事情的线索。这促使研究人员研究人类如何根据声波来理解正在发生的事件。近年来，研究人员也在研究如何使机器具备这种能力，即声事件检测。本研究的重点是利用频谱图技术和深度学习方法进行声事件检测。首先，利用频谱图技术从声学数据生成频谱图图像。然后，将生成的频谱图输入到预训练的MobileNet模型中，提取鲁棒特征表示。本文采用一维卷积神经网络(1D-CNN)对声事件检测模型进行训练。特征表示是从预训练的MobileNet中提取的。提出的1D-CNN由卷积层和池化层的几种替代方案组成。最后一个池化层被平面化并馈送到一个完全连接的层中以对事件进行分类。采用Dropout来防止过拟合。然后用三个数据集(Soundscapes1、Soundscapes2和UrbanSound8k)对预训练的MobileNet和1D-CNN提出的频谱图进行评估。实验结果表明，该方法对Soundscapes1、Soundscapes2和UrbanSound8k分别获得81分、86分和70分的f1分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)

自引率

0.00%

发文量