Audio-based event detection in office live environments using optimized MFCC-SVM approach

Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015) Pub Date : 2015-03-02 DOI:10.1109/ICOSC.2015.7050855

Selver Ezgi Küçükbay, M. Sert

{"title":"Audio-based event detection in office live environments using optimized MFCC-SVM approach","authors":"Selver Ezgi Küçükbay, M. Sert","doi":"10.1109/ICOSC.2015.7050855","DOIUrl":null,"url":null,"abstract":"Audio data contains several sounds and is an important source for multimedia applications. One of them is unstructured Environmental Sounds (also referred to as audio events) that have noise-like characteristics with flat spectrums. Therefore, in general, recognition methods applied for music and speech data are not appropriate for the Environmental Sounds. In this paper, we propose an MFCC-SVM based approach that exploits the effect of feature representation and learner optimization tasks for efficient recognition of audio events from audio signals. The proposed approach considers efficient representation of MFCC features using different window and hop sizes by changing the number of Mel coefficients in the analyses as well as optimizing the SVM parameters. Moreover, 16 different audio events from the IEEE Audio and Acoustic Signal Processing (AASP) Challenge Dataset, namely alert, clear throat, cough, door slam, drawer, keyboard, keys, knock, laughter, mouse, page turn, pen drop, phone, printer, speech, and switch that are collected from office live environments are utilized in the evaluations. Our empirical evaluations show that, when the results of the proposed methods are chosen for MFFC feature and SVM classifier, the tests conducted through using 5-fold cross validation gives the results of 62%, 58% and 55% for Precision, Recall and F-measure scores, respectively. Extensive experiments on audio-based event detection using the IEEE AASP Challenge dataset show the effectiveness of the proposed approach.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOSC.2015.7050855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

Abstract

Audio data contains several sounds and is an important source for multimedia applications. One of them is unstructured Environmental Sounds (also referred to as audio events) that have noise-like characteristics with flat spectrums. Therefore, in general, recognition methods applied for music and speech data are not appropriate for the Environmental Sounds. In this paper, we propose an MFCC-SVM based approach that exploits the effect of feature representation and learner optimization tasks for efficient recognition of audio events from audio signals. The proposed approach considers efficient representation of MFCC features using different window and hop sizes by changing the number of Mel coefficients in the analyses as well as optimizing the SVM parameters. Moreover, 16 different audio events from the IEEE Audio and Acoustic Signal Processing (AASP) Challenge Dataset, namely alert, clear throat, cough, door slam, drawer, keyboard, keys, knock, laughter, mouse, page turn, pen drop, phone, printer, speech, and switch that are collected from office live environments are utilized in the evaluations. Our empirical evaluations show that, when the results of the proposed methods are chosen for MFFC feature and SVM classifier, the tests conducted through using 5-fold cross validation gives the results of 62%, 58% and 55% for Precision, Recall and F-measure scores, respectively. Extensive experiments on audio-based event detection using the IEEE AASP Challenge dataset show the effectiveness of the proposed approach.

查看原文本刊更多论文

使用优化的MFCC-SVM方法的基于音频的办公实时环境事件检测

音频数据包含多种声音，是多媒体应用的重要来源。其中之一是非结构化的环境声音(也称为音频事件)，它具有平坦频谱的噪声特征。因此，一般来说，应用于音乐和语音数据的识别方法并不适用于环境声音。在本文中，我们提出了一种基于MFCC-SVM的方法，该方法利用特征表示和学习器优化任务的效果，从音频信号中有效识别音频事件。该方法通过改变分析中Mel系数的数量以及优化支持向量机参数，考虑了使用不同窗口和跳跃大小的MFCC特征的有效表示。此外，评估中还使用了来自IEEE音频和声学信号处理(AASP)挑战数据集的16种不同的音频事件，即警报、清嗓子、咳嗽、摔门、抽屉、键盘、按键、敲门声、笑声、鼠标、翻页、掉笔、电话、打印机、语音和开关，这些事件都是从办公现场环境中收集的。我们的实证评估表明，当选择MFFC特征和SVM分类器的结果时，通过5倍交叉验证进行的测试在Precision, Recall和F-measure得分上分别获得62%，58%和55%的结果。利用IEEE AASP挑战数据集对基于音频的事件检测进行了大量实验，结果表明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)

自引率

0.00%

发文量