Leveraging on Cross Linguistic Similarities to Reduce Grammar Development Effort for the Under-Resourced Languages: a Case of Kenyan Bantu Languages

Benson Kituku, Wanjiku Nganga, Lawrence Muchemi
{"title":"Leveraging on Cross Linguistic Similarities to Reduce Grammar Development Effort for the Under-Resourced Languages: a Case of Kenyan Bantu Languages","authors":"Benson Kituku, Wanjiku Nganga, Lawrence Muchemi","doi":"10.1109/ict4da53266.2021.9672222","DOIUrl":null,"url":null,"abstract":"Rule-based grammar development is labor-intensive in terms of time and knowledge requirements, especially for complex morphology and under-resourced languages. Notwithstanding, these grammars are needed for deep natural language processing, generation of well-formed output, or both. To address the challenge, this paper seeks to develop shared multilingual wide-coverage grammar for a subset of Kenyan Bantu languages in Grammatical Framework (GF) by leveraging on cross linguistic similarities using the grammar engineering strategies: grammar porting and grammar sharing. The shared grammar was developed using the morphology-driven approach, where the lexicons are defined first, followed by inflection regular expression and finally the syntax production rules. The resulting congruent Bantu parameterized grammar had shareability for category linearizations, parameters, paradigms, and syntax rules of 100%, 68.75%, 65.3% and 89.57%, respectively, while portability (modification) was exhibited in paradigms, parameter plus syntax rules at 14.29%, 18.75% and 10.43% respectively. The research concludes leveraging on the cross-linguistic similarities of principles and parameters significantly reduces multilingual grammar's development effort and contributes by developing the Bantu parametrized grammar which demonstrates how the effort of developing the rule base has been significantly reduced in languages where data is a scarce commodity.","PeriodicalId":371663,"journal":{"name":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ict4da53266.2021.9672222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Rule-based grammar development is labor-intensive in terms of time and knowledge requirements, especially for complex morphology and under-resourced languages. Notwithstanding, these grammars are needed for deep natural language processing, generation of well-formed output, or both. To address the challenge, this paper seeks to develop shared multilingual wide-coverage grammar for a subset of Kenyan Bantu languages in Grammatical Framework (GF) by leveraging on cross linguistic similarities using the grammar engineering strategies: grammar porting and grammar sharing. The shared grammar was developed using the morphology-driven approach, where the lexicons are defined first, followed by inflection regular expression and finally the syntax production rules. The resulting congruent Bantu parameterized grammar had shareability for category linearizations, parameters, paradigms, and syntax rules of 100%, 68.75%, 65.3% and 89.57%, respectively, while portability (modification) was exhibited in paradigms, parameter plus syntax rules at 14.29%, 18.75% and 10.43% respectively. The research concludes leveraging on the cross-linguistic similarities of principles and parameters significantly reduces multilingual grammar's development effort and contributes by developing the Bantu parametrized grammar which demonstrates how the effort of developing the rule base has been significantly reduced in languages where data is a scarce commodity.
利用跨语言相似性减少资源不足语言的语法开发努力:以肯尼亚班图语为例
基于规则的语法开发在时间和知识需求方面是劳动密集型的,特别是对于复杂的形态学和资源不足的语言。尽管如此,深度自然语言处理、生成格式良好的输出或两者都需要这些语法。为了解决这一挑战,本文试图利用语法工程策略:语法移植和语法共享,在语法框架(GF)中为肯尼亚班图语的一个子集开发共享的多语言广泛覆盖语法。共享语法是使用形态驱动的方法开发的,其中首先定义词汇,然后定义屈折变化正则表达式,最后定义语法生成规则。所得到的同余Bantu参数化语法在类别线性化、参数、范式和语法规则方面的共享性分别为100%、68.75%、65.3%和89.57%,在范式、参数加语法规则方面的可移植性(修改性)分别为14.29%、18.75%和10.43%。研究得出结论,利用原则和参数的跨语言相似性大大减少了多语言语法的开发工作量,并通过开发班图参数化语法做出了贡献,这表明在数据稀缺的语言中,开发规则库的工作量大大减少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信