{"title":"基于BERT嵌入的越南语跨域选区分析","authors":"Thi-Thu-Hong Phan, Ngoc-Thanh-Tung Huynh, Thinh Hung Truong, Tuan-An Dao, Dinh Dien","doi":"10.1109/KSE.2019.8919467","DOIUrl":null,"url":null,"abstract":"Syntactic structure of sentences obtained from Constituency Parsing is fundamental information in many Natural Language Processing tasks. However, due to the lack of available resources and the complex linguistic features of Vietnamese, the research into Constituency Parsing has not received enough attention in this language. To the best of our knowledge, the study presented in this paper is one of the first investigations to explore this task in Vietnamese. In this work, we present a Spanbased approach which focuses on representing spans through the use of contextualized pre-trained embeddings to obtain optimal parse trees for Vietnamese sentences. The conducted experiments indicate that our system achieved promising results on the VLSP Vietnamese Treebank dataset by significantly outperforming existing methods. The results of this study support the view that encoding context information into the representation of words is effective in improving the parsing performance of Vietnamese. Consequently, this idea can be generalized to apply to other tasks such as Dependency Parsing or other low-resource languages.","PeriodicalId":439841,"journal":{"name":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Vietnamese Span-based Constituency Parsing with BERT Embedding\",\"authors\":\"Thi-Thu-Hong Phan, Ngoc-Thanh-Tung Huynh, Thinh Hung Truong, Tuan-An Dao, Dinh Dien\",\"doi\":\"10.1109/KSE.2019.8919467\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Syntactic structure of sentences obtained from Constituency Parsing is fundamental information in many Natural Language Processing tasks. However, due to the lack of available resources and the complex linguistic features of Vietnamese, the research into Constituency Parsing has not received enough attention in this language. To the best of our knowledge, the study presented in this paper is one of the first investigations to explore this task in Vietnamese. In this work, we present a Spanbased approach which focuses on representing spans through the use of contextualized pre-trained embeddings to obtain optimal parse trees for Vietnamese sentences. The conducted experiments indicate that our system achieved promising results on the VLSP Vietnamese Treebank dataset by significantly outperforming existing methods. The results of this study support the view that encoding context information into the representation of words is effective in improving the parsing performance of Vietnamese. Consequently, this idea can be generalized to apply to other tasks such as Dependency Parsing or other low-resource languages.\",\"PeriodicalId\":439841,\"journal\":{\"name\":\"2019 11th International Conference on Knowledge and Systems Engineering (KSE)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 11th International Conference on Knowledge and Systems Engineering (KSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KSE.2019.8919467\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE.2019.8919467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Vietnamese Span-based Constituency Parsing with BERT Embedding
Syntactic structure of sentences obtained from Constituency Parsing is fundamental information in many Natural Language Processing tasks. However, due to the lack of available resources and the complex linguistic features of Vietnamese, the research into Constituency Parsing has not received enough attention in this language. To the best of our knowledge, the study presented in this paper is one of the first investigations to explore this task in Vietnamese. In this work, we present a Spanbased approach which focuses on representing spans through the use of contextualized pre-trained embeddings to obtain optimal parse trees for Vietnamese sentences. The conducted experiments indicate that our system achieved promising results on the VLSP Vietnamese Treebank dataset by significantly outperforming existing methods. The results of this study support the view that encoding context information into the representation of words is effective in improving the parsing performance of Vietnamese. Consequently, this idea can be generalized to apply to other tasks such as Dependency Parsing or other low-resource languages.