{"title":"越南语词性标注的实验研究","authors":"Oanh T. K. Tran, C. Le, Quang-Thuy Ha, Quynh Lê","doi":"10.1109/IALP.2009.14","DOIUrl":null,"url":null,"abstract":"In Natural Language Processing (NLP), Part-of-speech tagging is one of the important tasks. It, however, has not drawn much attention of Vietnamese researchers all over the world. In this paper, we present an experimental study on Vietnamese POS tagging. Motivated from Chinese research and Vietnamese characteristics, we present a new kind of features based on the idea of word composition. We call it morpheme based features. To verify the effectiveness of these features, we use three powerful machine learning techniques - MEM, CRF and SVM. In addition, we also built a Vietnamese POS-tagged corpus with approximately 8000 sentences of different genres to conduct experiments. Experimental results showed that morpheme-based features always give higher precision in comparison with previous approaches - usually word-based features. We achieved the precision of 91.64% by using these morpheme-based features.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"An Experimental Study on Vietnamese POS Tagging\",\"authors\":\"Oanh T. K. Tran, C. Le, Quang-Thuy Ha, Quynh Lê\",\"doi\":\"10.1109/IALP.2009.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Natural Language Processing (NLP), Part-of-speech tagging is one of the important tasks. It, however, has not drawn much attention of Vietnamese researchers all over the world. In this paper, we present an experimental study on Vietnamese POS tagging. Motivated from Chinese research and Vietnamese characteristics, we present a new kind of features based on the idea of word composition. We call it morpheme based features. To verify the effectiveness of these features, we use three powerful machine learning techniques - MEM, CRF and SVM. In addition, we also built a Vietnamese POS-tagged corpus with approximately 8000 sentences of different genres to conduct experiments. Experimental results showed that morpheme-based features always give higher precision in comparison with previous approaches - usually word-based features. We achieved the precision of 91.64% by using these morpheme-based features.\",\"PeriodicalId\":156840,\"journal\":{\"name\":\"2009 International Conference on Asian Language Processing\",\"volume\":\"115 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference on Asian Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2009.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Asian Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2009.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In Natural Language Processing (NLP), Part-of-speech tagging is one of the important tasks. It, however, has not drawn much attention of Vietnamese researchers all over the world. In this paper, we present an experimental study on Vietnamese POS tagging. Motivated from Chinese research and Vietnamese characteristics, we present a new kind of features based on the idea of word composition. We call it morpheme based features. To verify the effectiveness of these features, we use three powerful machine learning techniques - MEM, CRF and SVM. In addition, we also built a Vietnamese POS-tagged corpus with approximately 8000 sentences of different genres to conduct experiments. Experimental results showed that morpheme-based features always give higher precision in comparison with previous approaches - usually word-based features. We achieved the precision of 91.64% by using these morpheme-based features.