Improved Voice Activity Detection based on support vector machine with high separable speech feature vectors

2014 19th International Conference on Digital Signal Processing Pub Date : 2014-09-18 DOI:10.1109/ICDSP.2014.6900767

Y. Zou, W. Zheng, Wei Shi, Hong Liu

{"title":"Improved Voice Activity Detection based on support vector machine with high separable speech feature vectors","authors":"Y. Zou, W. Zheng, Wei Shi, Hong Liu","doi":"10.1109/ICDSP.2014.6900767","DOIUrl":null,"url":null,"abstract":"Voice Activity Detection (VAD) is one of the key techniques for many speech applications. Existing VAD algorithms have shown unsatisfied performance under nonstationary noise and low Signal-to-Noise-Ratio (SNR) situations. Motivated by the fact that people is able to distinguish the speech and non-speech even in low SNR situations, this paper studies the VAD technique from the pattern recognition point of view, where the VAD essentially is formulated as a binary classification problem. Specifically, the VAD is implemented by classifying the speech signal into speech and non-speech segments. The radial basis function (RBF) based support vector machine (SVM) is employed with supervised manner, which is perfectly suitable for binary classification tasks with some training samples. Aiming at achieving improved accuracy and robustness of the VAD technique to noise, the feature selection has been conducted by introducing the class separation measure (CSM) criterion to evaluate the capability of the feature vectors extracted for classifying speech and non-speech segments. Most famous speech features have been taken into account, including Mel-frequency cepstral coefficients (MFCC), the principal component analysis of the MFCC (PCA-MFCC), linear predictive coding (LPC) and linear predictive cepstral coding (LPCC). Intensive experimental results show that the MFCC features capture the most relevant information of speech and keep good separability of classification in different noisy conditions, so do the PCA-MFCC features. Moreover, the PCA-MFCC features are more robust to the noise with less computational cost. As a result, a VAD method by using the PCA-MFCC and the RBF-SVM as the classifier has been developed, which is termed as PCA-SVM-VAD for short. The experimental results with the NOIZEUS database show that the proposed PCA-SVM-VAD method has clear improvements over other VAD methods and performs much more robust in car noisy environment at various SNRs.","PeriodicalId":301856,"journal":{"name":"2014 19th International Conference on Digital Signal Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 19th International Conference on Digital Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSP.2014.6900767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

Voice Activity Detection (VAD) is one of the key techniques for many speech applications. Existing VAD algorithms have shown unsatisfied performance under nonstationary noise and low Signal-to-Noise-Ratio (SNR) situations. Motivated by the fact that people is able to distinguish the speech and non-speech even in low SNR situations, this paper studies the VAD technique from the pattern recognition point of view, where the VAD essentially is formulated as a binary classification problem. Specifically, the VAD is implemented by classifying the speech signal into speech and non-speech segments. The radial basis function (RBF) based support vector machine (SVM) is employed with supervised manner, which is perfectly suitable for binary classification tasks with some training samples. Aiming at achieving improved accuracy and robustness of the VAD technique to noise, the feature selection has been conducted by introducing the class separation measure (CSM) criterion to evaluate the capability of the feature vectors extracted for classifying speech and non-speech segments. Most famous speech features have been taken into account, including Mel-frequency cepstral coefficients (MFCC), the principal component analysis of the MFCC (PCA-MFCC), linear predictive coding (LPC) and linear predictive cepstral coding (LPCC). Intensive experimental results show that the MFCC features capture the most relevant information of speech and keep good separability of classification in different noisy conditions, so do the PCA-MFCC features. Moreover, the PCA-MFCC features are more robust to the noise with less computational cost. As a result, a VAD method by using the PCA-MFCC and the RBF-SVM as the classifier has been developed, which is termed as PCA-SVM-VAD for short. The experimental results with the NOIZEUS database show that the proposed PCA-SVM-VAD method has clear improvements over other VAD methods and performs much more robust in car noisy environment at various SNRs.

查看原文本刊更多论文

基于支持向量机的高可分离语音特征向量改进语音活动检测

语音活动检测(VAD)是许多语音应用的关键技术之一。现有的VAD算法在非平稳噪声和低信噪比情况下表现不理想。考虑到即使在低信噪比的情况下，人们也能够区分语音和非语音，本文从模式识别的角度研究VAD技术，其中VAD本质上是一个二值分类问题。具体来说，VAD是通过将语音信号分为语音段和非语音段来实现的。基于径向基函数(RBF)的支持向量机(SVM)以监督的方式被应用于具有一定训练样本的二值分类任务。为了提高VAD技术对噪声的准确性和鲁棒性，引入了类分离度量(class separation measure, CSM)准则来评估提取的特征向量对语音和非语音片段进行分类的能力，从而进行了特征选择。最著名的语音特征包括Mel-frequency倒谱系数(MFCC)、MFCC的主成分分析(PCA-MFCC)、线性预测编码(LPC)和线性预测倒谱编码(LPCC)。大量的实验结果表明，MFCC特征在不同的噪声条件下都能捕获语音最相关的信息，并保持良好的分类可分性，PCA-MFCC特征也同样如此。此外，PCA-MFCC特征对噪声的鲁棒性更强，计算成本更低。因此，本文提出了一种采用PCA-MFCC和RBF-SVM作为分类器的VAD方法，简称PCA-SVM-VAD。在NOIZEUS数据库上的实验结果表明，PCA-SVM-VAD方法比其他VAD方法有明显的改进，在不同信噪比的汽车噪声环境下具有更强的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 19th International Conference on Digital Signal Processing

自引率

0.00%

发文量