Vietnamese Span-based Constituency Parsing with BERT Embedding

Thi-Thu-Hong Phan, Ngoc-Thanh-Tung Huynh, Thinh Hung Truong, Tuan-An Dao, Dinh Dien
{"title":"Vietnamese Span-based Constituency Parsing with BERT Embedding","authors":"Thi-Thu-Hong Phan, Ngoc-Thanh-Tung Huynh, Thinh Hung Truong, Tuan-An Dao, Dinh Dien","doi":"10.1109/KSE.2019.8919467","DOIUrl":null,"url":null,"abstract":"Syntactic structure of sentences obtained from Constituency Parsing is fundamental information in many Natural Language Processing tasks. However, due to the lack of available resources and the complex linguistic features of Vietnamese, the research into Constituency Parsing has not received enough attention in this language. To the best of our knowledge, the study presented in this paper is one of the first investigations to explore this task in Vietnamese. In this work, we present a Spanbased approach which focuses on representing spans through the use of contextualized pre-trained embeddings to obtain optimal parse trees for Vietnamese sentences. The conducted experiments indicate that our system achieved promising results on the VLSP Vietnamese Treebank dataset by significantly outperforming existing methods. The results of this study support the view that encoding context information into the representation of words is effective in improving the parsing performance of Vietnamese. Consequently, this idea can be generalized to apply to other tasks such as Dependency Parsing or other low-resource languages.","PeriodicalId":439841,"journal":{"name":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE.2019.8919467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Syntactic structure of sentences obtained from Constituency Parsing is fundamental information in many Natural Language Processing tasks. However, due to the lack of available resources and the complex linguistic features of Vietnamese, the research into Constituency Parsing has not received enough attention in this language. To the best of our knowledge, the study presented in this paper is one of the first investigations to explore this task in Vietnamese. In this work, we present a Spanbased approach which focuses on representing spans through the use of contextualized pre-trained embeddings to obtain optimal parse trees for Vietnamese sentences. The conducted experiments indicate that our system achieved promising results on the VLSP Vietnamese Treebank dataset by significantly outperforming existing methods. The results of this study support the view that encoding context information into the representation of words is effective in improving the parsing performance of Vietnamese. Consequently, this idea can be generalized to apply to other tasks such as Dependency Parsing or other low-resource languages.
基于BERT嵌入的越南语跨域选区分析
从句法分析中获得的句子句法结构是许多自然语言处理任务的基础信息。然而,由于资源的缺乏和越南语复杂的语言特点,对越南语选区解析的研究还没有得到足够的重视。据我们所知,本文中提出的研究是在越南探索这一任务的第一次调查之一。在这项工作中,我们提出了一种基于西班牙语的方法,重点是通过使用上下文化的预训练嵌入来表示跨度,以获得越南语句子的最佳解析树。实验表明,我们的系统在VLSP越南树库数据集上取得了令人满意的结果,显著优于现有的方法。本研究的结果支持将语境信息编码到词的表示中可以有效地提高越南语的解析性能的观点。因此,这个想法可以推广到其他任务,如依赖解析或其他低资源语言。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信