基于离散小波变换的语音识别特征提取

Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105) Pub Date : 2000-04-07 DOI:10.1109/SECON.2000.845444

Z. Tufekci, J. Gowdy

{"title":"基于离散小波变换的语音识别特征提取","authors":"Z. Tufekci, J. Gowdy","doi":"10.1109/SECON.2000.845444","DOIUrl":null,"url":null,"abstract":"We propose a new feature vector consisting of mel-frequency discrete wavelet coefficients (MFDWC). The MFDWC are obtained by applying the discrete wavelet transform (DWT) to the mel-scaled log filterbank energies of a speech frame. The purpose of using the DWT is to benefit from its localization property in the time and frequency domains. MFDWC are similar to subband-based (SUB) features and multi-resolution (MULT) features in that both attempt to achieve good time and frequency localization. However, MFDWC have better time/frequency localization than SUB features and MULT features. We evaluated the performance of new features for clean speech and noisy speech and compared the performance of MFDWC with mel-frequency cepstral coefficients (MFCC), SUB features and MULT features. Experimental results on a phoneme recognition task showed that a MFDWC-based recognizer gave better results than recognizers based on MFCC, SUB features, and MULT features for white Gaussian noise, band-limited white Gaussian noise and clean speech cases.","PeriodicalId":206022,"journal":{"name":"Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"103","resultStr":"{\"title\":\"Feature extraction using discrete wavelet transform for speech recognition\",\"authors\":\"Z. Tufekci, J. Gowdy\",\"doi\":\"10.1109/SECON.2000.845444\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a new feature vector consisting of mel-frequency discrete wavelet coefficients (MFDWC). The MFDWC are obtained by applying the discrete wavelet transform (DWT) to the mel-scaled log filterbank energies of a speech frame. The purpose of using the DWT is to benefit from its localization property in the time and frequency domains. MFDWC are similar to subband-based (SUB) features and multi-resolution (MULT) features in that both attempt to achieve good time and frequency localization. However, MFDWC have better time/frequency localization than SUB features and MULT features. We evaluated the performance of new features for clean speech and noisy speech and compared the performance of MFDWC with mel-frequency cepstral coefficients (MFCC), SUB features and MULT features. Experimental results on a phoneme recognition task showed that a MFDWC-based recognizer gave better results than recognizers based on MFCC, SUB features, and MULT features for white Gaussian noise, band-limited white Gaussian noise and clean speech cases.\",\"PeriodicalId\":206022,\"journal\":{\"name\":\"Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"103\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SECON.2000.845444\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SECON.2000.845444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 103

摘要

我们提出了一种由mel频率离散小波系数(MFDWC)组成的新的特征向量。将离散小波变换(DWT)应用于语音帧的mel尺度对数滤波器组能量，得到MFDWC。使用小波变换的目的是利用其在时域和频域的局部化特性。MFDWC类似于基于子带(SUB)特征和多分辨率(MULT)特征，两者都试图实现良好的时间和频率定位。但是，MFDWC比SUB特征和MULT特征具有更好的时频定位。我们评估了清洁语音和有噪声语音的新特征的性能，并将MFDWC的性能与mel-frequency倒谱系数(MFCC)、SUB特征和MULT特征进行了比较。在一个音素识别任务上的实验结果表明，对于高斯白噪声、带限高斯白噪声和干净语音情况，基于mfdwc的识别器比基于MFCC、SUB特征和MULT特征的识别器具有更好的识别效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Feature extraction using discrete wavelet transform for speech recognition

We propose a new feature vector consisting of mel-frequency discrete wavelet coefficients (MFDWC). The MFDWC are obtained by applying the discrete wavelet transform (DWT) to the mel-scaled log filterbank energies of a speech frame. The purpose of using the DWT is to benefit from its localization property in the time and frequency domains. MFDWC are similar to subband-based (SUB) features and multi-resolution (MULT) features in that both attempt to achieve good time and frequency localization. However, MFDWC have better time/frequency localization than SUB features and MULT features. We evaluated the performance of new features for clean speech and noisy speech and compared the performance of MFDWC with mel-frequency cepstral coefficients (MFCC), SUB features and MULT features. Experimental results on a phoneme recognition task showed that a MFDWC-based recognizer gave better results than recognizers based on MFCC, SUB features, and MULT features for white Gaussian noise, band-limited white Gaussian noise and clean speech cases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105)

自引率

0.00%

发文量