Thi Luong Nguyen, L. My, Viet Hung Nguyen, Huyen Thi Minh Nguyen, Hong Phuong Le
{"title":"为越南依赖项解析构建一个树库","authors":"Thi Luong Nguyen, L. My, Viet Hung Nguyen, Huyen Thi Minh Nguyen, Hong Phuong Le","doi":"10.1109/RIVF.2013.6719884","DOIUrl":null,"url":null,"abstract":"The problem of Vietnamese syntactic parsing, especially constituency parsing, has recently been tackled by several research groups. A common effort of the Vietnamese language processing community has allowed the creation of VietTreebank, a reference parsed corpus containing about 10,000 sentences for the constituency parsing task. In this paper, we present our work to build a reference treebank, based on VietTreebank, for the dependency parsing task, which has not yet been very well studied for Vietnamese. First we define a dependency label set by adapting the dependency schema developed by the NLP group at Stanford university and taking into account the particularities of Vietnamese grammar. Then we propose an algorithm to convert a constituency treebank to a dependency one. The algorithm is tested on a set of 100 sentences of VietTreebank corpus and gives very good results. Finally, we carry out an experiment on Vietnamese dependency parsing using MaltParser tool and the dependency treebank converted from VietTreebank.","PeriodicalId":121216,"journal":{"name":"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"Building a treebank for Vietnamese dependency parsing\",\"authors\":\"Thi Luong Nguyen, L. My, Viet Hung Nguyen, Huyen Thi Minh Nguyen, Hong Phuong Le\",\"doi\":\"10.1109/RIVF.2013.6719884\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of Vietnamese syntactic parsing, especially constituency parsing, has recently been tackled by several research groups. A common effort of the Vietnamese language processing community has allowed the creation of VietTreebank, a reference parsed corpus containing about 10,000 sentences for the constituency parsing task. In this paper, we present our work to build a reference treebank, based on VietTreebank, for the dependency parsing task, which has not yet been very well studied for Vietnamese. First we define a dependency label set by adapting the dependency schema developed by the NLP group at Stanford university and taking into account the particularities of Vietnamese grammar. Then we propose an algorithm to convert a constituency treebank to a dependency one. The algorithm is tested on a set of 100 sentences of VietTreebank corpus and gives very good results. Finally, we carry out an experiment on Vietnamese dependency parsing using MaltParser tool and the dependency treebank converted from VietTreebank.\",\"PeriodicalId\":121216,\"journal\":{\"name\":\"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RIVF.2013.6719884\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF.2013.6719884","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Building a treebank for Vietnamese dependency parsing
The problem of Vietnamese syntactic parsing, especially constituency parsing, has recently been tackled by several research groups. A common effort of the Vietnamese language processing community has allowed the creation of VietTreebank, a reference parsed corpus containing about 10,000 sentences for the constituency parsing task. In this paper, we present our work to build a reference treebank, based on VietTreebank, for the dependency parsing task, which has not yet been very well studied for Vietnamese. First we define a dependency label set by adapting the dependency schema developed by the NLP group at Stanford university and taking into account the particularities of Vietnamese grammar. Then we propose an algorithm to convert a constituency treebank to a dependency one. The algorithm is tested on a set of 100 sentences of VietTreebank corpus and gives very good results. Finally, we carry out an experiment on Vietnamese dependency parsing using MaltParser tool and the dependency treebank converted from VietTreebank.