为越南依赖项解析构建一个树库

The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF) Pub Date : 2013-11-01 DOI:10.1109/RIVF.2013.6719884

Thi Luong Nguyen, L. My, Viet Hung Nguyen, Huyen Thi Minh Nguyen, Hong Phuong Le

{"title":"为越南依赖项解析构建一个树库","authors":"Thi Luong Nguyen, L. My, Viet Hung Nguyen, Huyen Thi Minh Nguyen, Hong Phuong Le","doi":"10.1109/RIVF.2013.6719884","DOIUrl":null,"url":null,"abstract":"The problem of Vietnamese syntactic parsing, especially constituency parsing, has recently been tackled by several research groups. A common effort of the Vietnamese language processing community has allowed the creation of VietTreebank, a reference parsed corpus containing about 10,000 sentences for the constituency parsing task. In this paper, we present our work to build a reference treebank, based on VietTreebank, for the dependency parsing task, which has not yet been very well studied for Vietnamese. First we define a dependency label set by adapting the dependency schema developed by the NLP group at Stanford university and taking into account the particularities of Vietnamese grammar. Then we propose an algorithm to convert a constituency treebank to a dependency one. The algorithm is tested on a set of 100 sentences of VietTreebank corpus and gives very good results. Finally, we carry out an experiment on Vietnamese dependency parsing using MaltParser tool and the dependency treebank converted from VietTreebank.","PeriodicalId":121216,"journal":{"name":"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"Building a treebank for Vietnamese dependency parsing\",\"authors\":\"Thi Luong Nguyen, L. My, Viet Hung Nguyen, Huyen Thi Minh Nguyen, Hong Phuong Le\",\"doi\":\"10.1109/RIVF.2013.6719884\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of Vietnamese syntactic parsing, especially constituency parsing, has recently been tackled by several research groups. A common effort of the Vietnamese language processing community has allowed the creation of VietTreebank, a reference parsed corpus containing about 10,000 sentences for the constituency parsing task. In this paper, we present our work to build a reference treebank, based on VietTreebank, for the dependency parsing task, which has not yet been very well studied for Vietnamese. First we define a dependency label set by adapting the dependency schema developed by the NLP group at Stanford university and taking into account the particularities of Vietnamese grammar. Then we propose an algorithm to convert a constituency treebank to a dependency one. The algorithm is tested on a set of 100 sentences of VietTreebank corpus and gives very good results. Finally, we carry out an experiment on Vietnamese dependency parsing using MaltParser tool and the dependency treebank converted from VietTreebank.\",\"PeriodicalId\":121216,\"journal\":{\"name\":\"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RIVF.2013.6719884\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF.2013.6719884","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

摘要

越南语的句法分析问题，特别是选区分析问题，最近已经有几个研究小组着手解决。越南语处理社区的共同努力已经允许创建VietTreebank，这是一个包含大约10,000个句子的参考解析语料库，用于选区解析任务。在本文中，我们介绍了我们基于VietTreebank为依赖解析任务构建参考树库的工作，该任务在越南语中尚未得到很好的研究。首先，我们根据斯坦福大学NLP小组开发的依赖模式，并考虑到越南语语法的特殊性，定义了一个依赖标签集。然后提出了一种将选区树库转换为依赖树库的算法。该算法在100句的VietTreebank语料库上进行了测试，得到了很好的结果。最后，我们使用MaltParser工具和由VietTreebank转换而来的依赖树库进行了越南语依赖解析实验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Building a treebank for Vietnamese dependency parsing

The problem of Vietnamese syntactic parsing, especially constituency parsing, has recently been tackled by several research groups. A common effort of the Vietnamese language processing community has allowed the creation of VietTreebank, a reference parsed corpus containing about 10,000 sentences for the constituency parsing task. In this paper, we present our work to build a reference treebank, based on VietTreebank, for the dependency parsing task, which has not yet been very well studied for Vietnamese. First we define a dependency label set by adapting the dependency schema developed by the NLP group at Stanford university and taking into account the particularities of Vietnamese grammar. Then we propose an algorithm to convert a constituency treebank to a dependency one. The algorithm is tested on a set of 100 sentences of VietTreebank corpus and gives very good results. Finally, we carry out an experiment on Vietnamese dependency parsing using MaltParser tool and the dependency treebank converted from VietTreebank.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)

自引率

0.00%

发文量