{"title":"Leveraging on Cross Linguistic Similarities to Reduce Grammar Development Effort for the Under-Resourced Languages: a Case of Kenyan Bantu Languages","authors":"Benson Kituku, Wanjiku Nganga, Lawrence Muchemi","doi":"10.1109/ict4da53266.2021.9672222","DOIUrl":null,"url":null,"abstract":"Rule-based grammar development is labor-intensive in terms of time and knowledge requirements, especially for complex morphology and under-resourced languages. Notwithstanding, these grammars are needed for deep natural language processing, generation of well-formed output, or both. To address the challenge, this paper seeks to develop shared multilingual wide-coverage grammar for a subset of Kenyan Bantu languages in Grammatical Framework (GF) by leveraging on cross linguistic similarities using the grammar engineering strategies: grammar porting and grammar sharing. The shared grammar was developed using the morphology-driven approach, where the lexicons are defined first, followed by inflection regular expression and finally the syntax production rules. The resulting congruent Bantu parameterized grammar had shareability for category linearizations, parameters, paradigms, and syntax rules of 100%, 68.75%, 65.3% and 89.57%, respectively, while portability (modification) was exhibited in paradigms, parameter plus syntax rules at 14.29%, 18.75% and 10.43% respectively. The research concludes leveraging on the cross-linguistic similarities of principles and parameters significantly reduces multilingual grammar's development effort and contributes by developing the Bantu parametrized grammar which demonstrates how the effort of developing the rule base has been significantly reduced in languages where data is a scarce commodity.","PeriodicalId":371663,"journal":{"name":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ict4da53266.2021.9672222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Rule-based grammar development is labor-intensive in terms of time and knowledge requirements, especially for complex morphology and under-resourced languages. Notwithstanding, these grammars are needed for deep natural language processing, generation of well-formed output, or both. To address the challenge, this paper seeks to develop shared multilingual wide-coverage grammar for a subset of Kenyan Bantu languages in Grammatical Framework (GF) by leveraging on cross linguistic similarities using the grammar engineering strategies: grammar porting and grammar sharing. The shared grammar was developed using the morphology-driven approach, where the lexicons are defined first, followed by inflection regular expression and finally the syntax production rules. The resulting congruent Bantu parameterized grammar had shareability for category linearizations, parameters, paradigms, and syntax rules of 100%, 68.75%, 65.3% and 89.57%, respectively, while portability (modification) was exhibited in paradigms, parameter plus syntax rules at 14.29%, 18.75% and 10.43% respectively. The research concludes leveraging on the cross-linguistic similarities of principles and parameters significantly reduces multilingual grammar's development effort and contributes by developing the Bantu parametrized grammar which demonstrates how the effort of developing the rule base has been significantly reduced in languages where data is a scarce commodity.