使用Transformer的印尼语单词音节化

Muhammad Haykal Kamil, S. Suyanto, M. Bijaksana
{"title":"使用Transformer的印尼语单词音节化","authors":"Muhammad Haykal Kamil, S. Suyanto, M. Bijaksana","doi":"10.1109/ISITIA59021.2023.10221089","DOIUrl":null,"url":null,"abstract":"Syllabification is a process from word to a series of syllable. Syllabification can be used in Natural Language Processing (NLP) such as speech recognition, text-to-speech, rhyme detection, and many more. Syllabification of the Indonesian language will refers to the General Guidelines for Indonesian Language Spelling (PUEBI). The corpus for this research containing the main word and the syllable. The corpus for this research is using ”50k KBBI 5-k fold” and combined with ”103k Named Entity 5-k fold”. The evaluation for this model is using Word Error Rate (WER). WER of previous deep learning model for syllabification is still high with 3.75% WER. The objective of this research is to lower the WER for deep learning using Transformer and Syllable Tagging because it can accept long contextual dependency. The evaluation result of this model is 3.68% WER and can be used universally for Indonesian words because the margin between Formal Words and Named Entity Words is close with the average result. Thus, this model currently the better model for Indonesian syllabification deep learning model according from the average WER is lower than the other deep learning model.","PeriodicalId":116682,"journal":{"name":"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Word Syllabification for Indonesian Language using Transformer\",\"authors\":\"Muhammad Haykal Kamil, S. Suyanto, M. Bijaksana\",\"doi\":\"10.1109/ISITIA59021.2023.10221089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Syllabification is a process from word to a series of syllable. Syllabification can be used in Natural Language Processing (NLP) such as speech recognition, text-to-speech, rhyme detection, and many more. Syllabification of the Indonesian language will refers to the General Guidelines for Indonesian Language Spelling (PUEBI). The corpus for this research containing the main word and the syllable. The corpus for this research is using ”50k KBBI 5-k fold” and combined with ”103k Named Entity 5-k fold”. The evaluation for this model is using Word Error Rate (WER). WER of previous deep learning model for syllabification is still high with 3.75% WER. The objective of this research is to lower the WER for deep learning using Transformer and Syllable Tagging because it can accept long contextual dependency. The evaluation result of this model is 3.68% WER and can be used universally for Indonesian words because the margin between Formal Words and Named Entity Words is close with the average result. Thus, this model currently the better model for Indonesian syllabification deep learning model according from the average WER is lower than the other deep learning model.\",\"PeriodicalId\":116682,\"journal\":{\"name\":\"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISITIA59021.2023.10221089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISITIA59021.2023.10221089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

音节化是一个从单词到一系列音节的过程。音节化可用于自然语言处理(NLP),如语音识别、文本到语音、押韵检测等。印尼语的音节将参考印尼语拼写通用指南(PUEBI)。本研究的语料库包含主词和音节。本研究的语料库使用“50k KBBI 5-k fold”并结合“103k Named Entity 5-k fold”。对该模型的评价采用了单词错误率(WER)。以前的深度学习模型在音节化方面的WER仍然很高,为3.75%。本研究的目的是降低使用Transformer和音节标签进行深度学习的WER,因为它可以接受长上下文依赖。该模型的评价结果为3.68%的WER,由于形式词和命名实体词之间的差值与平均结果接近,可以普遍用于印尼语单词。因此,该模型是目前较好的印尼语音节化深度学习模型,根据平均WER低于其他深度学习模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Word Syllabification for Indonesian Language using Transformer
Syllabification is a process from word to a series of syllable. Syllabification can be used in Natural Language Processing (NLP) such as speech recognition, text-to-speech, rhyme detection, and many more. Syllabification of the Indonesian language will refers to the General Guidelines for Indonesian Language Spelling (PUEBI). The corpus for this research containing the main word and the syllable. The corpus for this research is using ”50k KBBI 5-k fold” and combined with ”103k Named Entity 5-k fold”. The evaluation for this model is using Word Error Rate (WER). WER of previous deep learning model for syllabification is still high with 3.75% WER. The objective of this research is to lower the WER for deep learning using Transformer and Syllable Tagging because it can accept long contextual dependency. The evaluation result of this model is 3.68% WER and can be used universally for Indonesian words because the margin between Formal Words and Named Entity Words is close with the average result. Thus, this model currently the better model for Indonesian syllabification deep learning model according from the average WER is lower than the other deep learning model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信