句子对齐平行语料库

Miftah Nina, Ataa Allah Fadoua, Taghbalout Imane
{"title":"句子对齐平行语料库","authors":"Miftah Nina, Ataa Allah Fadoua, Taghbalout Imane","doi":"10.1109/IACS.2017.7921946","DOIUrl":null,"url":null,"abstract":"Current research, in Natural Language Processing, shows more interest in the under-resourced languages, during last years. Amazigh language is the autochthon language of North Africa. However, until 2011 that it became a constitutionally official language in Morocco, after years of persecution. Amazigh language is still considered as one of the under resourced languages. The question is: “how can the Amazigh language reach advanced languages?” Motivated by these considerations, we describe our effort in the development of an Amazigh-English parallel corpus aimed to be used in linguistic research, teaching, and natural language processing application, primarily machine translation. To the best of our knowledge, this corpus is the first Amazigh-English parallel corpus. The built corpus is sentence aligned, including 20726 sentences. The alignment was done automatically, while the evaluation was done manually. The experimentation results are satisfactory, achieving more than 90%.","PeriodicalId":180504,"journal":{"name":"2017 8th International Conference on Information and Communication Systems (ICICS)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sentence-aligned parallel corpus Amazigh-English\",\"authors\":\"Miftah Nina, Ataa Allah Fadoua, Taghbalout Imane\",\"doi\":\"10.1109/IACS.2017.7921946\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current research, in Natural Language Processing, shows more interest in the under-resourced languages, during last years. Amazigh language is the autochthon language of North Africa. However, until 2011 that it became a constitutionally official language in Morocco, after years of persecution. Amazigh language is still considered as one of the under resourced languages. The question is: “how can the Amazigh language reach advanced languages?” Motivated by these considerations, we describe our effort in the development of an Amazigh-English parallel corpus aimed to be used in linguistic research, teaching, and natural language processing application, primarily machine translation. To the best of our knowledge, this corpus is the first Amazigh-English parallel corpus. The built corpus is sentence aligned, including 20726 sentences. The alignment was done automatically, while the evaluation was done manually. The experimentation results are satisfactory, achieving more than 90%.\",\"PeriodicalId\":180504,\"journal\":{\"name\":\"2017 8th International Conference on Information and Communication Systems (ICICS)\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 8th International Conference on Information and Communication Systems (ICICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IACS.2017.7921946\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 8th International Conference on Information and Communication Systems (ICICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IACS.2017.7921946","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,自然语言处理领域的研究对资源不足的语言更感兴趣。阿马齐格语是北非的土著语言。然而,直到2011年,经过多年的迫害,它才成为摩洛哥的宪法官方语言。阿马齐格语仍然被认为是资源不足的语言之一。问题是:“阿马齐格语是如何达到高级语言水平的?”基于这些考虑,我们描述了我们在开发阿马齐格-英语平行语料库方面所做的努力,该语料库旨在用于语言学研究、教学和自然语言处理应用,主要是机器翻译。据我们所知,这个语料库是第一个阿马齐格-英语平行语料库。构建的语料库是句子对齐的,包括20726个句子。对齐是自动完成的,而评估是手动完成的。实验结果令人满意,达到90%以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sentence-aligned parallel corpus Amazigh-English
Current research, in Natural Language Processing, shows more interest in the under-resourced languages, during last years. Amazigh language is the autochthon language of North Africa. However, until 2011 that it became a constitutionally official language in Morocco, after years of persecution. Amazigh language is still considered as one of the under resourced languages. The question is: “how can the Amazigh language reach advanced languages?” Motivated by these considerations, we describe our effort in the development of an Amazigh-English parallel corpus aimed to be used in linguistic research, teaching, and natural language processing application, primarily machine translation. To the best of our knowledge, this corpus is the first Amazigh-English parallel corpus. The built corpus is sentence aligned, including 20726 sentences. The alignment was done automatically, while the evaluation was done manually. The experimentation results are satisfactory, achieving more than 90%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信