Song Nguyen Duc Cong, Quoc Hung Ngo, Rachsuda Jiamthapthaksin
{"title":"State-of-the-art Vietnamese word segmentation","authors":"Song Nguyen Duc Cong, Quoc Hung Ngo, Rachsuda Jiamthapthaksin","doi":"10.1109/ICSITech.2016.7852619","DOIUrl":null,"url":null,"abstract":"Word segmentation is the first step of any tasks in Vietnamese language processing. This paper reviews state-of-the-art approaches and systems for word segmentation in Vietnamese. To have an overview of all stages from building corpora to developing toolkits, we discuss building the corpus stage, approaches applied to solve the word segmentation and existing toolkits to segment words in Vietnamese sentences. In addition, this study shows clearly the motivations on building corpus and implementing machine learning techniques to improve the accuracy for Vietnamese word segmentation. According to our observation, this study also reports a few of achievements and limitations in existing Vietnamese word segmentation systems.","PeriodicalId":447090,"journal":{"name":"2016 2nd International Conference on Science in Information Technology (ICSITech)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Science in Information Technology (ICSITech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSITech.2016.7852619","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Word segmentation is the first step of any tasks in Vietnamese language processing. This paper reviews state-of-the-art approaches and systems for word segmentation in Vietnamese. To have an overview of all stages from building corpora to developing toolkits, we discuss building the corpus stage, approaches applied to solve the word segmentation and existing toolkits to segment words in Vietnamese sentences. In addition, this study shows clearly the motivations on building corpus and implementing machine learning techniques to improve the accuracy for Vietnamese word segmentation. According to our observation, this study also reports a few of achievements and limitations in existing Vietnamese word segmentation systems.