{"title":"利用CNN和Bi-LSTM在印尼G2P使用变压器","authors":"A. Rachman, S. Suyanto, Ema Rachmawati","doi":"10.1145/3457682.3457706","DOIUrl":null,"url":null,"abstract":"We apply a transformer called tensor2tensor toolkit, which is based on Tensorflow, to overcome the Grapheme-to-Phoneme conversion problem. This study performs conversions to produce pronunciation symbols for certain letter sequences in Indonesian particularly. The unavailability of the G2P conversion system in Indonesian is currently being faced, so research is being carried out to create a system that can solve this problem by applying the Transformer. The transformer has a simple network architecture based solely on the attention mechanism, so we took advantage of eliminating convolution and redundancies—complex recurrent and convolution neural networks including encoders and decoders as the basis for the sequence transduction model. The excellent performance of the model is obtained through the attention mechanism by connecting the encoder and decoder. By using this tool, we carry out to compare among KBBI and CMU dictionary datasets. We attained a word error rate (WER) of 6,7% on the KBBI data set after training for three days on two core CPUs, which has an accuracy of 93,3%, improving over the existing best results CMU dictionary dataset for 26% word error rate. In this study, we carried out a detailed experimental evaluation by assessing the processing time and the error rate of words and then compared it with state of the art. By demonstrating this Transformer, this tool successfully generalizes and then applies it to several Indonesian elements with limited training data and large training data. We concluded that the transformer model is suitable for dealing with the G2P problem at hand for this task.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Leveraging CNN and Bi-LSTM in Indonesian G2P Using Transformer\",\"authors\":\"A. Rachman, S. Suyanto, Ema Rachmawati\",\"doi\":\"10.1145/3457682.3457706\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We apply a transformer called tensor2tensor toolkit, which is based on Tensorflow, to overcome the Grapheme-to-Phoneme conversion problem. This study performs conversions to produce pronunciation symbols for certain letter sequences in Indonesian particularly. The unavailability of the G2P conversion system in Indonesian is currently being faced, so research is being carried out to create a system that can solve this problem by applying the Transformer. The transformer has a simple network architecture based solely on the attention mechanism, so we took advantage of eliminating convolution and redundancies—complex recurrent and convolution neural networks including encoders and decoders as the basis for the sequence transduction model. The excellent performance of the model is obtained through the attention mechanism by connecting the encoder and decoder. By using this tool, we carry out to compare among KBBI and CMU dictionary datasets. We attained a word error rate (WER) of 6,7% on the KBBI data set after training for three days on two core CPUs, which has an accuracy of 93,3%, improving over the existing best results CMU dictionary dataset for 26% word error rate. In this study, we carried out a detailed experimental evaluation by assessing the processing time and the error rate of words and then compared it with state of the art. By demonstrating this Transformer, this tool successfully generalizes and then applies it to several Indonesian elements with limited training data and large training data. We concluded that the transformer model is suitable for dealing with the G2P problem at hand for this task.\",\"PeriodicalId\":142045,\"journal\":{\"name\":\"2021 13th International Conference on Machine Learning and Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Machine Learning and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3457682.3457706\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457682.3457706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Leveraging CNN and Bi-LSTM in Indonesian G2P Using Transformer
We apply a transformer called tensor2tensor toolkit, which is based on Tensorflow, to overcome the Grapheme-to-Phoneme conversion problem. This study performs conversions to produce pronunciation symbols for certain letter sequences in Indonesian particularly. The unavailability of the G2P conversion system in Indonesian is currently being faced, so research is being carried out to create a system that can solve this problem by applying the Transformer. The transformer has a simple network architecture based solely on the attention mechanism, so we took advantage of eliminating convolution and redundancies—complex recurrent and convolution neural networks including encoders and decoders as the basis for the sequence transduction model. The excellent performance of the model is obtained through the attention mechanism by connecting the encoder and decoder. By using this tool, we carry out to compare among KBBI and CMU dictionary datasets. We attained a word error rate (WER) of 6,7% on the KBBI data set after training for three days on two core CPUs, which has an accuracy of 93,3%, improving over the existing best results CMU dictionary dataset for 26% word error rate. In this study, we carried out a detailed experimental evaluation by assessing the processing time and the error rate of words and then compared it with state of the art. By demonstrating this Transformer, this tool successfully generalizes and then applies it to several Indonesian elements with limited training data and large training data. We concluded that the transformer model is suitable for dealing with the G2P problem at hand for this task.