{"title":"Improving Grapheme-to-Phoneme Conversion by Investigating Copying Mechanism in Recurrent Architectures","authors":"Abhishek Niranjan, M. Shaik","doi":"10.1109/ASRU46091.2019.9003729","DOIUrl":null,"url":null,"abstract":"Attention driven encoder-decoder architectures have become highly successful in various sequence-to-sequence learning tasks. We propose copy-augmented Bi-directional Long Short-Term Memory based Encoder-Decoder architecture for the Grapheme-to-Phoneme conversion. In Grapheme-to-Phoneme task, a number of character units in words possess high degree of similarity with some phoneme unit(s). Thus, we make an attempt to capture this characteristic using copy-augmented architecture. Our proposed model automatically learns to generate phoneme sequences during inference by copying source token embeddings to the decoder's output in a controlled manner. To our knowledge, this is the first time the copy-augmentation is being investigated for Grapheme-to-Phoneme conversion task. We validate our experiments over accented and non-accented publicly available CMU-Dict datasets and achieve State-of-The-Art performances in terms of both phoneme and word error rates. Further, we verify the applicability of our proposed approach on Hindi Lexicon and show that our model outperforms all recent State-of-The-Art results.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Attention driven encoder-decoder architectures have become highly successful in various sequence-to-sequence learning tasks. We propose copy-augmented Bi-directional Long Short-Term Memory based Encoder-Decoder architecture for the Grapheme-to-Phoneme conversion. In Grapheme-to-Phoneme task, a number of character units in words possess high degree of similarity with some phoneme unit(s). Thus, we make an attempt to capture this characteristic using copy-augmented architecture. Our proposed model automatically learns to generate phoneme sequences during inference by copying source token embeddings to the decoder's output in a controlled manner. To our knowledge, this is the first time the copy-augmentation is being investigated for Grapheme-to-Phoneme conversion task. We validate our experiments over accented and non-accented publicly available CMU-Dict datasets and achieve State-of-The-Art performances in terms of both phoneme and word error rates. Further, we verify the applicability of our proposed approach on Hindi Lexicon and show that our model outperforms all recent State-of-The-Art results.