Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments

Xugang Lu, M. Unoki, Satoshi Nakamura
{"title":"Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments","authors":"Xugang Lu, M. Unoki, Satoshi Nakamura","doi":"10.1145/1667780.1667832","DOIUrl":null,"url":null,"abstract":"In this study, we proposed a feature extraction method based on the subband temporal envelopes (STEs) and their normalization for reverberated speech recognition. The STEs were extracted by using a series of constant bandwidth band-pass filters with Hilbert transform followed by a low-pass filtering. In the normalization, both the modulation spectrum (MS) of the subband temporal envelopes of the clean and reverberated speech are normalized to a reference MS calculated from a clean speech data set. Based on the normalized subband MS, the inverse Fourier transform was used to restore the subband temporal envelopes. We tested the proposed method on speech recognition in a reverberant room with different speaker to microphone distance (SMD). For comparison, the recognition performance of using the traditional Mel-cepstral coefficients with mean and variance normalization were used as the baseline. Experimental results showed that, by averaging the SMDs from 50 cm to 400 cm, there was a 44.96% relative improvement by only using subband temporal envelope processing, and further a 15.68% relative improvement by using the normalization on the subband modulation spectrum. Totally, there was about a 53.59% relative improvement, which was better than those of using other temporal filtering and normalization methods.","PeriodicalId":103128,"journal":{"name":"Proceedings of the 3rd International Universal Communication Symposium","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Universal Communication Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1667780.1667832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this study, we proposed a feature extraction method based on the subband temporal envelopes (STEs) and their normalization for reverberated speech recognition. The STEs were extracted by using a series of constant bandwidth band-pass filters with Hilbert transform followed by a low-pass filtering. In the normalization, both the modulation spectrum (MS) of the subband temporal envelopes of the clean and reverberated speech are normalized to a reference MS calculated from a clean speech data set. Based on the normalized subband MS, the inverse Fourier transform was used to restore the subband temporal envelopes. We tested the proposed method on speech recognition in a reverberant room with different speaker to microphone distance (SMD). For comparison, the recognition performance of using the traditional Mel-cepstral coefficients with mean and variance normalization were used as the baseline. Experimental results showed that, by averaging the SMDs from 50 cm to 400 cm, there was a 44.96% relative improvement by only using subband temporal envelope processing, and further a 15.68% relative improvement by using the normalization on the subband modulation spectrum. Totally, there was about a 53.59% relative improvement, which was better than those of using other temporal filtering and normalization methods.
混响环境下用于语音自动识别的子带时间包络调制频谱的归一化
本研究提出了一种基于子带时间包络及其归一化的混响语音识别特征提取方法。通过希尔伯特变换和低通滤波,利用一系列等带宽带通滤波器提取出STEs。在归一化中,干净语音和混响语音的子带时间包络的调制频谱(MS)都归一化为从干净语音数据集计算的参考MS。在归一化子带MS的基础上,利用傅里叶反变换恢复子带时间包络。我们在不同扬声器与麦克风距离(SMD)的混响室内对该方法进行了语音识别测试。为了比较,使用传统的mel -倒谱系数进行均值和方差归一化的识别性能作为基线。实验结果表明,对50 ~ 400 cm范围内的smd进行平均处理,仅采用子带时间包络处理的相对性能提高了44.96%,采用子带调制谱归一化处理的相对性能提高了15.68%。总体而言,相对改进率约为53.59%,优于其他时间滤波和归一化方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信