{"title":"大型缅甸语语音词典中字素到音素转换的序列到序列模型","authors":"Aye Mya Hlaing, Win Pa Pa","doi":"10.1109/O-COCOSDA46868.2019.9041225","DOIUrl":null,"url":null,"abstract":"Grapheme to phoneme conversion is the production of pronunciation for a given word. Neural sequence to sequence models have been applied for grapheme to phoneme conversion recently. This paper analyzes the effectiveness of neural sequence to sequence models in grapheme to phoneme conversion for Myanmar language. The first large Myanmar pronunciation dictionary is introduced, and it is applied in building sequence to sequence models. The performance of four grapheme to phoneme conversion models, joint sequence model, Transformer, simple encoder-decoder, and attention enabled encoder-decoder models, are evaluated in terms of phoneme error rate(PER) and word error rate(WER). Analysis on three-word classes and six phoneme error types are done and discussed details in this paper. According to the evaluations, the Transformer has comparable results to traditional joint sequence model.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Sequence-to-Sequence Models for Grapheme to Phoneme Conversion on Large Myanmar Pronunciation Dictionary\",\"authors\":\"Aye Mya Hlaing, Win Pa Pa\",\"doi\":\"10.1109/O-COCOSDA46868.2019.9041225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Grapheme to phoneme conversion is the production of pronunciation for a given word. Neural sequence to sequence models have been applied for grapheme to phoneme conversion recently. This paper analyzes the effectiveness of neural sequence to sequence models in grapheme to phoneme conversion for Myanmar language. The first large Myanmar pronunciation dictionary is introduced, and it is applied in building sequence to sequence models. The performance of four grapheme to phoneme conversion models, joint sequence model, Transformer, simple encoder-decoder, and attention enabled encoder-decoder models, are evaluated in terms of phoneme error rate(PER) and word error rate(WER). Analysis on three-word classes and six phoneme error types are done and discussed details in this paper. According to the evaluations, the Transformer has comparable results to traditional joint sequence model.\",\"PeriodicalId\":263209,\"journal\":{\"name\":\"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/O-COCOSDA46868.2019.9041225\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sequence-to-Sequence Models for Grapheme to Phoneme Conversion on Large Myanmar Pronunciation Dictionary
Grapheme to phoneme conversion is the production of pronunciation for a given word. Neural sequence to sequence models have been applied for grapheme to phoneme conversion recently. This paper analyzes the effectiveness of neural sequence to sequence models in grapheme to phoneme conversion for Myanmar language. The first large Myanmar pronunciation dictionary is introduced, and it is applied in building sequence to sequence models. The performance of four grapheme to phoneme conversion models, joint sequence model, Transformer, simple encoder-decoder, and attention enabled encoder-decoder models, are evaluated in terms of phoneme error rate(PER) and word error rate(WER). Analysis on three-word classes and six phoneme error types are done and discussed details in this paper. According to the evaluations, the Transformer has comparable results to traditional joint sequence model.