The Guidelines of Building a Treebank for Modern Standard Arabic

Amena Dheif, Ahmed Abd El Ghany, Sameh Al Ansary
{"title":"The Guidelines of Building a Treebank for Modern Standard Arabic","authors":"Amena Dheif, Ahmed Abd El Ghany, Sameh Al Ansary","doi":"10.1109/ESOLEC54569.2022.10009330","DOIUrl":null,"url":null,"abstract":"Treebanks are one of the most needed and used linguistic resources in the fields of Natural language processing (NLP) and Natural language understanding (NLU). Arabic has only two constituency-based treebanks and a number of dependency treebanks. The current research presents the guidelines for building a parsed Arabic treebank for Modern Standard Arabic (MSA). The guidelines show, firstly the choice of the grammar formalism, then the genre and size of the treebank, and finally the annotation layers of the treebank. The study also shows that using the traditional Arabic grammar syntactic theory to describe the Arabic syntax has proven to be more suitable than using any of the modern syntax theories. Working with the traditional Arabic grammar also helps avoid the errors that the available treebank fell in as a result of using guidelines that don't suit the Arabic grammar. The study adopts three layers of annotations: the morphological layer, the syntactic layer, and the grammatical function layer. The resultant tree is a very detailed and rich syntactic tree, which is preferable by the researcher over having a huge amount of data poorly and shallowly annotated.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 20th International Conference on Language Engineering (ESOLEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESOLEC54569.2022.10009330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Treebanks are one of the most needed and used linguistic resources in the fields of Natural language processing (NLP) and Natural language understanding (NLU). Arabic has only two constituency-based treebanks and a number of dependency treebanks. The current research presents the guidelines for building a parsed Arabic treebank for Modern Standard Arabic (MSA). The guidelines show, firstly the choice of the grammar formalism, then the genre and size of the treebank, and finally the annotation layers of the treebank. The study also shows that using the traditional Arabic grammar syntactic theory to describe the Arabic syntax has proven to be more suitable than using any of the modern syntax theories. Working with the traditional Arabic grammar also helps avoid the errors that the available treebank fell in as a result of using guidelines that don't suit the Arabic grammar. The study adopts three layers of annotations: the morphological layer, the syntactic layer, and the grammatical function layer. The resultant tree is a very detailed and rich syntactic tree, which is preferable by the researcher over having a huge amount of data poorly and shallowly annotated.
现代标准阿拉伯语树库构建指南
树库是自然语言处理(NLP)和自然语言理解(NLU)领域最需要和使用的语言资源之一。阿拉伯语只有两个基于选区的树库和一些依赖树库。目前的研究提出了为现代标准阿拉伯语(MSA)建立解析阿拉伯语树库的指导方针。该指南首先给出了语法形式的选择,然后给出了树库的类型和大小,最后给出了树库的标注层。研究还表明,用传统的阿拉伯语语法语法理论来描述阿拉伯语语法比用任何现代语法理论都更合适。使用传统的阿拉伯语语法也有助于避免由于使用不适合阿拉伯语语法的指导方针而导致的可用树库的错误。本研究采用三层注释:形态层、句法层和语法功能层。生成的树是一个非常详细和丰富的语法树,这比有大量数据的糟糕和肤浅的注释更受研究人员的欢迎。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信