低声语音的声学分析与识别

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01. Pub Date : 2001-12-09 DOI:10.1109/ASRU.2001.1034676

Taisuke Itoh, K. Takeda, F. Itakura

{"title":"低声语音的声学分析与识别","authors":"Taisuke Itoh, K. Takeda, F. Itakura","doi":"10.1109/ASRU.2001.1034676","DOIUrl":null,"url":null,"abstract":"The acoustic properties and a recognition method of whispered speech are discussed. A whispered speech database that consists of whispered speech, normal speech and the corresponding facial video images of more than 6,000 sentences from 100 speakers was prepared. The comparison between whispered and normal utterances show that: 1) the cepstrum distance between them is 4 dB for voiced and 2 dB for unvoiced phonemes; 2) the spectral tilt of whispered speech is less sloped than for normal speech; 3) the frequency of the lower formants (below 1.5 kHz) is lower than that of normal speech. Acoustic models (HMM) trained by the whispered speech database attain an accuracy of 60% in syllable recognition experiments. This accuracy can be improved to 63% when MLLR (maximum likelihood linear regression) adaptation is applied, while the normal speech HMMs adapted with whispered speech attain only 56% syllable accuracy.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Acoustic analysis and recognition of whispered speech\",\"authors\":\"Taisuke Itoh, K. Takeda, F. Itakura\",\"doi\":\"10.1109/ASRU.2001.1034676\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The acoustic properties and a recognition method of whispered speech are discussed. A whispered speech database that consists of whispered speech, normal speech and the corresponding facial video images of more than 6,000 sentences from 100 speakers was prepared. The comparison between whispered and normal utterances show that: 1) the cepstrum distance between them is 4 dB for voiced and 2 dB for unvoiced phonemes; 2) the spectral tilt of whispered speech is less sloped than for normal speech; 3) the frequency of the lower formants (below 1.5 kHz) is lower than that of normal speech. Acoustic models (HMM) trained by the whispered speech database attain an accuracy of 60% in syllable recognition experiments. This accuracy can be improved to 63% when MLLR (maximum likelihood linear regression) adaptation is applied, while the normal speech HMMs adapted with whispered speech attain only 56% syllable accuracy.\",\"PeriodicalId\":118671,\"journal\":{\"name\":\"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2001.1034676\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2001.1034676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

讨论了低声语音的声学特性及其识别方法。建立了由100位说话者的6000多句耳语语音、正常语音和相应的面部视频图像组成的耳语语音数据库。低声语音与正常语音的对比表明:1)浊音音素与正常语音的倒谱距离为4 dB，浊音音素与正常语音的倒谱距离为2 dB;2)低语速语音的频谱倾斜度小于正常语速;3)低共振峰(低于1.5 kHz)的频率低于正常语音的频率。由语音数据库训练的声学模型在音节识别实验中达到了60%的准确率。当应用MLLR(最大似然线性回归)适应时，这种准确率可以提高到63%，而正常语音hmm与低声语音相适应的音节准确率仅为56%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Acoustic analysis and recognition of whispered speech

The acoustic properties and a recognition method of whispered speech are discussed. A whispered speech database that consists of whispered speech, normal speech and the corresponding facial video images of more than 6,000 sentences from 100 speakers was prepared. The comparison between whispered and normal utterances show that: 1) the cepstrum distance between them is 4 dB for voiced and 2 dB for unvoiced phonemes; 2) the spectral tilt of whispered speech is less sloped than for normal speech; 3) the frequency of the lower formants (below 1.5 kHz) is lower than that of normal speech. Acoustic models (HMM) trained by the whispered speech database attain an accuracy of 60% in syllable recognition experiments. This accuracy can be improved to 63% when MLLR (maximum likelihood linear regression) adaptation is applied, while the normal speech HMMs adapted with whispered speech attain only 56% syllable accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.

自引率

0.00%

发文量