Nguyen Thi Huong Thao, N. Thai, Nguyen Le Minh, Hà Quang Thụy
{"title":"Vietnamese Noun Phrase Chunking Based on Conditional Random Fields","authors":"Nguyen Thi Huong Thao, N. Thai, Nguyen Le Minh, Hà Quang Thụy","doi":"10.1109/KSE.2009.43","DOIUrl":null,"url":null,"abstract":"Noun phrase chunking is an important and useful task in many natural language processing applications. It is studied well for English, however with Vietnamese it is still an open problem. This paper presents a Vietnamese Noun Phrase chunking approach based on Conditional random fields (CRFs) models. We also describe a method to build Vietnamese corpus from a set of hand annotated sentences. For evaluation, we perform several experiments using different feature settings. Outcome results on our corpus show a high performance with the average of recall and precision 82.72% and 82.62% respectively.","PeriodicalId":347175,"journal":{"name":"2009 International Conference on Knowledge and Systems Engineering","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Knowledge and Systems Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE.2009.43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Noun phrase chunking is an important and useful task in many natural language processing applications. It is studied well for English, however with Vietnamese it is still an open problem. This paper presents a Vietnamese Noun Phrase chunking approach based on Conditional random fields (CRFs) models. We also describe a method to build Vietnamese corpus from a set of hand annotated sentences. For evaluation, we perform several experiments using different feature settings. Outcome results on our corpus show a high performance with the average of recall and precision 82.72% and 82.62% respectively.