利用CNN和Bi-LSTM在印尼G2P使用变压器

2021 13th International Conference on Machine Learning and Computing Pub Date : 2021-02-26 DOI:10.1145/3457682.3457706

A. Rachman, S. Suyanto, Ema Rachmawati

{"title":"利用CNN和Bi-LSTM在印尼G2P使用变压器","authors":"A. Rachman, S. Suyanto, Ema Rachmawati","doi":"10.1145/3457682.3457706","DOIUrl":null,"url":null,"abstract":"We apply a transformer called tensor2tensor toolkit, which is based on Tensorflow, to overcome the Grapheme-to-Phoneme conversion problem. This study performs conversions to produce pronunciation symbols for certain letter sequences in Indonesian particularly. The unavailability of the G2P conversion system in Indonesian is currently being faced, so research is being carried out to create a system that can solve this problem by applying the Transformer. The transformer has a simple network architecture based solely on the attention mechanism, so we took advantage of eliminating convolution and redundancies—complex recurrent and convolution neural networks including encoders and decoders as the basis for the sequence transduction model. The excellent performance of the model is obtained through the attention mechanism by connecting the encoder and decoder. By using this tool, we carry out to compare among KBBI and CMU dictionary datasets. We attained a word error rate (WER) of 6,7% on the KBBI data set after training for three days on two core CPUs, which has an accuracy of 93,3%, improving over the existing best results CMU dictionary dataset for 26% word error rate. In this study, we carried out a detailed experimental evaluation by assessing the processing time and the error rate of words and then compared it with state of the art. By demonstrating this Transformer, this tool successfully generalizes and then applies it to several Indonesian elements with limited training data and large training data. We concluded that the transformer model is suitable for dealing with the G2P problem at hand for this task.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Leveraging CNN and Bi-LSTM in Indonesian G2P Using Transformer\",\"authors\":\"A. Rachman, S. Suyanto, Ema Rachmawati\",\"doi\":\"10.1145/3457682.3457706\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We apply a transformer called tensor2tensor toolkit, which is based on Tensorflow, to overcome the Grapheme-to-Phoneme conversion problem. This study performs conversions to produce pronunciation symbols for certain letter sequences in Indonesian particularly. The unavailability of the G2P conversion system in Indonesian is currently being faced, so research is being carried out to create a system that can solve this problem by applying the Transformer. The transformer has a simple network architecture based solely on the attention mechanism, so we took advantage of eliminating convolution and redundancies—complex recurrent and convolution neural networks including encoders and decoders as the basis for the sequence transduction model. The excellent performance of the model is obtained through the attention mechanism by connecting the encoder and decoder. By using this tool, we carry out to compare among KBBI and CMU dictionary datasets. We attained a word error rate (WER) of 6,7% on the KBBI data set after training for three days on two core CPUs, which has an accuracy of 93,3%, improving over the existing best results CMU dictionary dataset for 26% word error rate. In this study, we carried out a detailed experimental evaluation by assessing the processing time and the error rate of words and then compared it with state of the art. By demonstrating this Transformer, this tool successfully generalizes and then applies it to several Indonesian elements with limited training data and large training data. We concluded that the transformer model is suitable for dealing with the G2P problem at hand for this task.\",\"PeriodicalId\":142045,\"journal\":{\"name\":\"2021 13th International Conference on Machine Learning and Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Machine Learning and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3457682.3457706\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457682.3457706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们应用了一个基于Tensorflow的名为tensor2tensor工具包的转换器来克服字素到音素的转换问题。本研究特别对印尼语中某些字母序列进行转换，生成发音符号。印尼目前正面临G2P转换系统无法使用的问题，因此正在进行研究，以创建一个系统，可以通过应用Transformer来解决这个问题。变压器具有简单的网络结构，仅基于注意机制，因此我们利用了消除卷积和冗余的优势-包括编码器和解码器的复杂递归和卷积神经网络作为序列转导模型的基础。该模型通过连接编码器和解码器的注意机制获得了优异的性能。利用该工具，我们对KBBI和CMU字典数据集进行了比较。在两个核心cpu上训练三天后，我们在KBBI数据集上获得了6.7%的词错误率(WER)，准确率为93.3%，比现有的最佳结果CMU字典数据集的26%的词错误率有所提高。在本研究中，我们通过评估单词的处理时间和错误率进行了详细的实验评估，并将其与目前的水平进行了比较。通过演示这个Transformer，该工具成功地将其一般化，然后将其应用于具有有限训练数据和大型训练数据的几个印度尼西亚元素。我们得出结论，变压器模型适合处理手头的G2P问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Leveraging CNN and Bi-LSTM in Indonesian G2P Using Transformer

We apply a transformer called tensor2tensor toolkit, which is based on Tensorflow, to overcome the Grapheme-to-Phoneme conversion problem. This study performs conversions to produce pronunciation symbols for certain letter sequences in Indonesian particularly. The unavailability of the G2P conversion system in Indonesian is currently being faced, so research is being carried out to create a system that can solve this problem by applying the Transformer. The transformer has a simple network architecture based solely on the attention mechanism, so we took advantage of eliminating convolution and redundancies—complex recurrent and convolution neural networks including encoders and decoders as the basis for the sequence transduction model. The excellent performance of the model is obtained through the attention mechanism by connecting the encoder and decoder. By using this tool, we carry out to compare among KBBI and CMU dictionary datasets. We attained a word error rate (WER) of 6,7% on the KBBI data set after training for three days on two core CPUs, which has an accuracy of 93,3%, improving over the existing best results CMU dictionary dataset for 26% word error rate. In this study, we carried out a detailed experimental evaluation by assessing the processing time and the error rate of words and then compared it with state of the art. By demonstrating this Transformer, this tool successfully generalizes and then applies it to several Indonesian elements with limited training data and large training data. We concluded that the transformer model is suitable for dealing with the G2P problem at hand for this task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 13th International Conference on Machine Learning and Computing

自引率

0.00%

发文量