普通话/台语双语语音识别的语音变异建模

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2005-09-01 DOI:10.30019/IJCLCLP.200509.0005

Dau-Cheng Lyu, Ren-Yuan Lyu, Yuang-Chin Chiang, Chun-Nan Hsu

{"title":"普通话/台语双语语音识别的语音变异建模","authors":"Dau-Cheng Lyu, Ren-Yuan Lyu, Yuang-Chin Chiang, Chun-Nan Hsu","doi":"10.30019/IJCLCLP.200509.0005","DOIUrl":null,"url":null,"abstract":"In this paper, a bi-lingual large vocaburary speech recognition experiment based on the idea of modeling pronunciation variations is described. The two languages under study are Mandarin Chinese and Taiwanese (Min-nan). These two languages are basically mutually unintelligible, and they have many words with the same Chinese characters and the same meanings, although they are pronounced differently. Observing the bi-lingual corpus, we found five types of pronunciation variations for Chinese characters. A one-pass, three-layer recognizer was developed that includes a combination of bi-lingual acoustic models, an integrated pronunciation model, and a tree-structure based searching net. The recognizer's performance was evaluated under three different pronunciation models. The results showed that the character error rate with integrated pronunciation models was better than that with pronunciation models, using either the knowledge-based or the data-driven approach. The relative frequency ratio was also used as a measure to choose the best number of pronunciation variations for each Chinese character. Finally, the best character error rates in Mandarin and Taiwanese testing sets were found to be 16.2% and 15.0%, respectively, when the average number of pronunciations for one Chinese character was 3.9.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition\",\"authors\":\"Dau-Cheng Lyu, Ren-Yuan Lyu, Yuang-Chin Chiang, Chun-Nan Hsu\",\"doi\":\"10.30019/IJCLCLP.200509.0005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, a bi-lingual large vocaburary speech recognition experiment based on the idea of modeling pronunciation variations is described. The two languages under study are Mandarin Chinese and Taiwanese (Min-nan). These two languages are basically mutually unintelligible, and they have many words with the same Chinese characters and the same meanings, although they are pronounced differently. Observing the bi-lingual corpus, we found five types of pronunciation variations for Chinese characters. A one-pass, three-layer recognizer was developed that includes a combination of bi-lingual acoustic models, an integrated pronunciation model, and a tree-structure based searching net. The recognizer's performance was evaluated under three different pronunciation models. The results showed that the character error rate with integrated pronunciation models was better than that with pronunciation models, using either the knowledge-based or the data-driven approach. The relative frequency ratio was also used as a measure to choose the best number of pronunciation variations for each Chinese character. Finally, the best character error rates in Mandarin and Taiwanese testing sets were found to be 16.2% and 15.0%, respectively, when the average number of pronunciations for one Chinese character was 3.9.\",\"PeriodicalId\":436300,\"journal\":{\"name\":\"Int. J. Comput. Linguistics Chin. Lang. Process.\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Linguistics Chin. Lang. Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30019/IJCLCLP.200509.0005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Linguistics Chin. Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30019/IJCLCLP.200509.0005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

本文介绍了一种基于语音变化建模思想的双语大词汇量语音识别实验。正在研究的两种语言是普通话和台湾(闽南语)。这两种语言基本上是互不理解的，它们有许多汉字相同，意思相同的单词，尽管它们的发音不同。通过对双语语料库的观察，我们发现了五种类型的汉字发音变化。开发了一种一遍三层识别器，包括双语声学模型、集成发音模型和基于树结构的搜索网络的组合。在三种不同的发音模型下对识别器的性能进行了评估。结果表明，无论是基于知识的方法还是数据驱动的方法，综合语音模型的字符错误率都优于语音模型。相对频率比也被用来为每个汉字选择最佳的发音变化数量。最后，当一个汉字的平均发音数为3.9个时，普通话和台语测试集的最佳汉字错误率分别为16.2%和15.0%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition

In this paper, a bi-lingual large vocaburary speech recognition experiment based on the idea of modeling pronunciation variations is described. The two languages under study are Mandarin Chinese and Taiwanese (Min-nan). These two languages are basically mutually unintelligible, and they have many words with the same Chinese characters and the same meanings, although they are pronounced differently. Observing the bi-lingual corpus, we found five types of pronunciation variations for Chinese characters. A one-pass, three-layer recognizer was developed that includes a combination of bi-lingual acoustic models, an integrated pronunciation model, and a tree-structure based searching net. The recognizer's performance was evaluated under three different pronunciation models. The results showed that the character error rate with integrated pronunciation models was better than that with pronunciation models, using either the knowledge-based or the data-driven approach. The relative frequency ratio was also used as a measure to choose the best number of pronunciation variations for each Chinese character. Finally, the best character error rates in Mandarin and Taiwanese testing sets were found to be 16.2% and 15.0%, respectively, when the average number of pronunciations for one Chinese character was 3.9.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Comput. Linguistics Chin. Lang. Process.

自引率

0.00%

发文量