越南语词性标注的实验研究

2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI:10.1109/IALP.2009.14

Oanh T. K. Tran, C. Le, Quang-Thuy Ha, Quynh Lê

{"title":"越南语词性标注的实验研究","authors":"Oanh T. K. Tran, C. Le, Quang-Thuy Ha, Quynh Lê","doi":"10.1109/IALP.2009.14","DOIUrl":null,"url":null,"abstract":"In Natural Language Processing (NLP), Part-of-speech tagging is one of the important tasks. It, however, has not drawn much attention of Vietnamese researchers all over the world. In this paper, we present an experimental study on Vietnamese POS tagging. Motivated from Chinese research and Vietnamese characteristics, we present a new kind of features based on the idea of word composition. We call it morpheme based features. To verify the effectiveness of these features, we use three powerful machine learning techniques - MEM, CRF and SVM. In addition, we also built a Vietnamese POS-tagged corpus with approximately 8000 sentences of different genres to conduct experiments. Experimental results showed that morpheme-based features always give higher precision in comparison with previous approaches - usually word-based features. We achieved the precision of 91.64% by using these morpheme-based features.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"An Experimental Study on Vietnamese POS Tagging\",\"authors\":\"Oanh T. K. Tran, C. Le, Quang-Thuy Ha, Quynh Lê\",\"doi\":\"10.1109/IALP.2009.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Natural Language Processing (NLP), Part-of-speech tagging is one of the important tasks. It, however, has not drawn much attention of Vietnamese researchers all over the world. In this paper, we present an experimental study on Vietnamese POS tagging. Motivated from Chinese research and Vietnamese characteristics, we present a new kind of features based on the idea of word composition. We call it morpheme based features. To verify the effectiveness of these features, we use three powerful machine learning techniques - MEM, CRF and SVM. In addition, we also built a Vietnamese POS-tagged corpus with approximately 8000 sentences of different genres to conduct experiments. Experimental results showed that morpheme-based features always give higher precision in comparison with previous approaches - usually word-based features. We achieved the precision of 91.64% by using these morpheme-based features.\",\"PeriodicalId\":156840,\"journal\":{\"name\":\"2009 International Conference on Asian Language Processing\",\"volume\":\"115 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference on Asian Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2009.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Asian Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2009.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

词性标注是自然语言处理(NLP)中的一项重要任务。然而，它并没有引起全世界越南研究人员的太多关注。本文对越南语词性标注进行了实验研究。从中国的研究和越南语的特点出发，我们提出了一种基于词构成思想的新特征。我们称之为基于语素的特征。为了验证这些特征的有效性，我们使用了三种强大的机器学习技术——MEM、CRF和SVM。此外，我们还建立了一个越南语poss标记语料库，其中包含大约8000个不同体裁的句子进行实验。实验结果表明，基于语素的特征识别比基于词的特征识别具有更高的准确率。使用这些基于语素的特征，准确率达到了91.64%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Experimental Study on Vietnamese POS Tagging

In Natural Language Processing (NLP), Part-of-speech tagging is one of the important tasks. It, however, has not drawn much attention of Vietnamese researchers all over the world. In this paper, we present an experimental study on Vietnamese POS tagging. Motivated from Chinese research and Vietnamese characteristics, we present a new kind of features based on the idea of word composition. We call it morpheme based features. To verify the effectiveness of these features, we use three powerful machine learning techniques - MEM, CRF and SVM. In addition, we also built a Vietnamese POS-tagged corpus with approximately 8000 sentences of different genres to conduct experiments. Experimental results showed that morpheme-based features always give higher precision in comparison with previous approaches - usually word-based features. We achieved the precision of 91.64% by using these morpheme-based features.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 International Conference on Asian Language Processing

自引率

0.00%

发文量