Speech vs music discrimination using Empirical Mode Decomposition

2015 Twenty First National Conference on Communications (NCC) Pub Date : 2015-04-16 DOI:10.1109/NCC.2015.7084865

B. K. Khonglah, Rajib Sharma, S. Prasanna

引用次数: 7

Abstract

This work explores the use of Empirical Mode Decomposition (EMD) for discriminating speech regions from music in audio recordings. The different frequency scales or Intrinsic Mode Functions (IMFs) obtained from EMD of the audio signal are found to contain discriminatory evidence for distinguishing the speech regions from the music regions of the audio signal. Different statistical measures like mean, absolute mean, variance, skewness and kurtosis are computed from the various IMFs and investigated for speech vs music discrimination. These features on being used for classification using classifiers like Support Vector Machines (SVMs) and k-Nearest Neighbour (k-NN) on the Scheirer and Slaney database gives the best overall classification accuracy of 90.83% for the SVMs and 85.33% for the k-NN.

查看原文本刊更多论文

基于经验模态分解的语音与音乐区分

这项工作探讨了使用经验模式分解(EMD)来区分录音中的语音区域和音乐。从音频信号的EMD中获得的不同频率尺度或内在模态函数(IMFs)包含区分音频信号的语音区域和音乐区域的区别证据。从各种imf中计算不同的统计度量，如平均值、绝对平均值、方差、偏度和峰度，并研究语音与音乐的区别。这些特征被用于在Scheirer和Slaney数据库上使用支持向量机(svm)和k-近邻(k-NN)等分类器进行分类，svm和k-NN的总体分类精度最高，分别为90.83%和85.33%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 Twenty First National Conference on Communications (NCC)

自引率

0.00%

发文量