{"title":"Word Syllabification for Indonesian Language using Transformer","authors":"Muhammad Haykal Kamil, S. Suyanto, M. Bijaksana","doi":"10.1109/ISITIA59021.2023.10221089","DOIUrl":null,"url":null,"abstract":"Syllabification is a process from word to a series of syllable. Syllabification can be used in Natural Language Processing (NLP) such as speech recognition, text-to-speech, rhyme detection, and many more. Syllabification of the Indonesian language will refers to the General Guidelines for Indonesian Language Spelling (PUEBI). The corpus for this research containing the main word and the syllable. The corpus for this research is using ”50k KBBI 5-k fold” and combined with ”103k Named Entity 5-k fold”. The evaluation for this model is using Word Error Rate (WER). WER of previous deep learning model for syllabification is still high with 3.75% WER. The objective of this research is to lower the WER for deep learning using Transformer and Syllable Tagging because it can accept long contextual dependency. The evaluation result of this model is 3.68% WER and can be used universally for Indonesian words because the margin between Formal Words and Named Entity Words is close with the average result. Thus, this model currently the better model for Indonesian syllabification deep learning model according from the average WER is lower than the other deep learning model.","PeriodicalId":116682,"journal":{"name":"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISITIA59021.2023.10221089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Syllabification is a process from word to a series of syllable. Syllabification can be used in Natural Language Processing (NLP) such as speech recognition, text-to-speech, rhyme detection, and many more. Syllabification of the Indonesian language will refers to the General Guidelines for Indonesian Language Spelling (PUEBI). The corpus for this research containing the main word and the syllable. The corpus for this research is using ”50k KBBI 5-k fold” and combined with ”103k Named Entity 5-k fold”. The evaluation for this model is using Word Error Rate (WER). WER of previous deep learning model for syllabification is still high with 3.75% WER. The objective of this research is to lower the WER for deep learning using Transformer and Syllable Tagging because it can accept long contextual dependency. The evaluation result of this model is 3.68% WER and can be used universally for Indonesian words because the margin between Formal Words and Named Entity Words is close with the average result. Thus, this model currently the better model for Indonesian syllabification deep learning model according from the average WER is lower than the other deep learning model.