Parameterization of Dominant Spectral Peak Trajectory for Whisper Speech Recognition

Chang Feng, Xiaolong Wu, Mingxing Xu, T. Zheng
{"title":"Parameterization of Dominant Spectral Peak Trajectory for Whisper Speech Recognition","authors":"Chang Feng, Xiaolong Wu, Mingxing Xu, T. Zheng","doi":"10.23919/APSIPAASC55919.2022.9980259","DOIUrl":null,"url":null,"abstract":"Automatic speech recognition (ASR) systems trained on normal speech generally suffer from performance degradations for whisper speech. To solve this problem, this paper concentrates on utilizing similar factors between normal and whisper speech to construct a whisper speech recognizer with normal speech data. We propose to parameterize the dominant spectral peak trajectory (Ppeak) to capture the similarities and concatenate it to the traditional Mel-Frequency Cepstral Coefficients (MFCC) and Human Factor Cepstral Coefficients (HFCC), respectively, to form new features. The proposed features benefit to the accuracy of whisper speech recognition. Performance improvement can be further achieved when the similarity is enhanced by removing low frequency information. Experimental results show that the performance degradation between match and mismatch scenarios was reduced relatively by 90.31% in Word Error Rate (WER) for HFCC after similarity enhancement at a cut-off frequency of 500Hz. Furthermore, we ultimately achieved a relative reduction of 69.60% in WER in the mismatch scenario compared with conventional MFCC even without whisper speech data for training.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9980259","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Automatic speech recognition (ASR) systems trained on normal speech generally suffer from performance degradations for whisper speech. To solve this problem, this paper concentrates on utilizing similar factors between normal and whisper speech to construct a whisper speech recognizer with normal speech data. We propose to parameterize the dominant spectral peak trajectory (Ppeak) to capture the similarities and concatenate it to the traditional Mel-Frequency Cepstral Coefficients (MFCC) and Human Factor Cepstral Coefficients (HFCC), respectively, to form new features. The proposed features benefit to the accuracy of whisper speech recognition. Performance improvement can be further achieved when the similarity is enhanced by removing low frequency information. Experimental results show that the performance degradation between match and mismatch scenarios was reduced relatively by 90.31% in Word Error Rate (WER) for HFCC after similarity enhancement at a cut-off frequency of 500Hz. Furthermore, we ultimately achieved a relative reduction of 69.60% in WER in the mismatch scenario compared with conventional MFCC even without whisper speech data for training.
耳语语音识别的优势谱峰轨迹参数化
基于正常语音训练的自动语音识别(ASR)系统在处理耳语语音时通常会出现性能下降。为了解决这一问题,本文着重利用正常语音和耳语语音之间的相似因素,利用正常语音数据构建耳语语音识别器。我们提出对主导谱峰轨迹(Ppeak)进行参数化以获取相似性,并将其分别与传统的Mel-Frequency Cepstral系数(MFCC)和Human Factor Cepstral系数(HFCC)连接,形成新的特征。所提出的特征有利于耳语语音识别的准确性。当通过去除低频信息来增强相似性时,可以进一步实现性能改进。实验结果表明,在截断频率为500Hz的相似度增强后,HFCC匹配和不匹配场景之间的性能下降相对降低了90.31%。此外,即使没有耳语语音数据进行训练,我们最终也实现了在不匹配场景下,与传统MFCC相比,WER的相对降低了69.60%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信