{"title":"A Javanese Syllabifier Based on its Orthographic System","authors":"Lucia D. Krisnawati, Aditya W. Mahastama","doi":"10.1109/IALP.2018.8629173","DOIUrl":null,"url":null,"abstract":"Automatic syllabification is considered as a finished process in high-resource languages. However, it is still badly needed in under-resourced and critical languages such as Javanese. Syllabification becomes the basic backbone in any task related to transliteration process for Abugida or syllabary scripts, word recognition, and speech synthesis. Due to the lack of data set and resources, this research applied a Finite State Transducer model to build a syllabifier for Javanese documents written in Latin. The segmentation rules are based on the orthograpic system of Javanese script. The experiment shows that the accuracy rate of segmented words into syllables achieves 95.56% for data set scrapped from Wiki and 97.92% for data set taken from Javanese magazine Djaka Lodang. The satisfying accuracy rates signifies that our syllabifier is capable of providing a corpus of Javanese syllables for more complex applications such as transliteration, word boundary prediction, or Optical Character Recognition for Javanese scripts.","PeriodicalId":156896,"journal":{"name":"2018 International Conference on Asian Language Processing (IALP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2018.8629173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Automatic syllabification is considered as a finished process in high-resource languages. However, it is still badly needed in under-resourced and critical languages such as Javanese. Syllabification becomes the basic backbone in any task related to transliteration process for Abugida or syllabary scripts, word recognition, and speech synthesis. Due to the lack of data set and resources, this research applied a Finite State Transducer model to build a syllabifier for Javanese documents written in Latin. The segmentation rules are based on the orthograpic system of Javanese script. The experiment shows that the accuracy rate of segmented words into syllables achieves 95.56% for data set scrapped from Wiki and 97.92% for data set taken from Javanese magazine Djaka Lodang. The satisfying accuracy rates signifies that our syllabifier is capable of providing a corpus of Javanese syllables for more complex applications such as transliteration, word boundary prediction, or Optical Character Recognition for Javanese scripts.