基于计算机的听力障碍发音训练的神经网络单声道模型

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI:10.1109/ICASSP.2003.1202373

M. Devarajan, Fansheng Meng, P. Hix, S. Zahorian

{"title":"基于计算机的听力障碍发音训练的神经网络单声道模型","authors":"M. Devarajan, Fansheng Meng, P. Hix, S. Zahorian","doi":"10.1109/ICASSP.2003.1202373","DOIUrl":null,"url":null,"abstract":"A visual speech training aid for persons with hearing impairments has been developed using a Windows-based multimedia computer. Previous papers (Zahorian, S. et al., Int. Conf. on Spoken Language Processing, 2002; Zahorian and Nossair, Z.B., IEEE Trans. on Speech and Audio Processing, vol.7, no.4, p.414-25, 1999; Zimmer, A. et al., ICASSP, vol.6, p.3625-8, 1998; Zahorian and Jagharghi, A., J. Acoust. Soc. Amer., vol.94, no.4, p.1966-82, 1993) have describe the signal processing steps and display options for giving real-time feedback about the quality of pronunciation for 10 steady-state American English monopthong vowels (/aa/, /iy/, /uw/, /ae/, /er/, /ih/, /eh/, /ao/, /ah/, and /uh/). This vowel training aid is thus referred to as a vowel articulation training aid (VATA). We now describe methods to develop a monophone-based hidden Markov model/neural network recognizer such that real time visual feedback can be given about the quality of pronunciation of short words and phrases. Experimental results are reported which indicate a high degree of accuracy for labeling and segmenting the CVC (consonant-vowel-consonant) database developed for \"training\" the display.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"HMM-neural network monophone models for computer-based articulation training for the hearing impaired\",\"authors\":\"M. Devarajan, Fansheng Meng, P. Hix, S. Zahorian\",\"doi\":\"10.1109/ICASSP.2003.1202373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A visual speech training aid for persons with hearing impairments has been developed using a Windows-based multimedia computer. Previous papers (Zahorian, S. et al., Int. Conf. on Spoken Language Processing, 2002; Zahorian and Nossair, Z.B., IEEE Trans. on Speech and Audio Processing, vol.7, no.4, p.414-25, 1999; Zimmer, A. et al., ICASSP, vol.6, p.3625-8, 1998; Zahorian and Jagharghi, A., J. Acoust. Soc. Amer., vol.94, no.4, p.1966-82, 1993) have describe the signal processing steps and display options for giving real-time feedback about the quality of pronunciation for 10 steady-state American English monopthong vowels (/aa/, /iy/, /uw/, /ae/, /er/, /ih/, /eh/, /ao/, /ah/, and /uh/). This vowel training aid is thus referred to as a vowel articulation training aid (VATA). We now describe methods to develop a monophone-based hidden Markov model/neural network recognizer such that real time visual feedback can be given about the quality of pronunciation of short words and phrases. Experimental results are reported which indicate a high degree of accuracy for labeling and segmenting the CVC (consonant-vowel-consonant) database developed for \\\"training\\\" the display.\",\"PeriodicalId\":104473,\"journal\":{\"name\":\"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2003.1202373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2003.1202373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

一套以视窗为基础的多媒体电脑，为听障人士设计的视觉语言训练教具。以前的论文(Zahorian, S. et al.， Int.)。口语语言处理研讨会，2002;Z.B Zahorian and Nossair, IEEE译。语音与音频处理，第7卷，第7期。4, p.414- 25,1999;齐默，A.等，ICASSP, vol.6, p.3625-8, 1998;Zahorian和Jagharghi, A.， J. Acoust。Soc。阿米尔。，第94卷，no。描述了信号处理步骤和显示选项，以实时反馈10个稳态美式英语单音节元音(/aa/， /iy/， /uw/， /ae/， /er/， /ih/， /eh/， /ao/， /ah/和/uh/)的发音质量。这种元音训练辅助工具因此被称为元音发音训练辅助工具(VATA)。我们现在描述了开发一个基于单声道的隐马尔可夫模型/神经网络识别器的方法，这样就可以给出关于短单词和短语发音质量的实时视觉反馈。实验结果表明，为“训练”显示而开发的CVC(辅音-元音-辅音)数据库的标记和分割具有很高的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HMM-neural network monophone models for computer-based articulation training for the hearing impaired

A visual speech training aid for persons with hearing impairments has been developed using a Windows-based multimedia computer. Previous papers (Zahorian, S. et al., Int. Conf. on Spoken Language Processing, 2002; Zahorian and Nossair, Z.B., IEEE Trans. on Speech and Audio Processing, vol.7, no.4, p.414-25, 1999; Zimmer, A. et al., ICASSP, vol.6, p.3625-8, 1998; Zahorian and Jagharghi, A., J. Acoust. Soc. Amer., vol.94, no.4, p.1966-82, 1993) have describe the signal processing steps and display options for giving real-time feedback about the quality of pronunciation for 10 steady-state American English monopthong vowels (/aa/, /iy/, /uw/, /ae/, /er/, /ih/, /eh/, /ao/, /ah/, and /uh/). This vowel training aid is thus referred to as a vowel articulation training aid (VATA). We now describe methods to develop a monophone-based hidden Markov model/neural network recognizer such that real time visual feedback can be given about the quality of pronunciation of short words and phrases. Experimental results are reported which indicate a high degree of accuracy for labeling and segmenting the CVC (consonant-vowel-consonant) database developed for "training" the display.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).

自引率

0.00%

发文量