使用可用的形态分析器和词性标记器标记古典阿拉伯语文本

A. Alosaimy, E. Atwell
{"title":"使用可用的形态分析器和词性标记器标记古典阿拉伯语文本","authors":"A. Alosaimy, E. Atwell","doi":"10.21248/jlcl.32.2017.212","DOIUrl":null,"url":null,"abstract":"Focusing on Classical Arabic, this paper in its first part evaluates morphological analysers and POS taggers that are available freely for research purposes, are designed for Modern Standard Arabic (MSA) or Classical Arabic (CA), are able to analyse all forms of words, and have academic credibility. We list and compare supported features of each tool, and how they differ in the format of the output, segmentation, Part-of-Speech (POS) tags and morphological features. We demonstrate a sample output of each analyser against one CA fully-vowelized sentence. This evaluation serves as a guide in choosing the best tool that suits research needs. In the second part, we report the accuracy and coverage of tagging a set of classical Arabic vocabulary extracted from classical texts. The results show a drop in the accuracy and coverage and suggest an ensemble method might increase accuracy and coverage for classical Arabic.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":"{\"title\":\"Tagging Classical Arabic Text using Available Morphological Analysers and Part of Speech Taggers\",\"authors\":\"A. Alosaimy, E. Atwell\",\"doi\":\"10.21248/jlcl.32.2017.212\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Focusing on Classical Arabic, this paper in its first part evaluates morphological analysers and POS taggers that are available freely for research purposes, are designed for Modern Standard Arabic (MSA) or Classical Arabic (CA), are able to analyse all forms of words, and have academic credibility. We list and compare supported features of each tool, and how they differ in the format of the output, segmentation, Part-of-Speech (POS) tags and morphological features. We demonstrate a sample output of each analyser against one CA fully-vowelized sentence. This evaluation serves as a guide in choosing the best tool that suits research needs. In the second part, we report the accuracy and coverage of tagging a set of classical Arabic vocabulary extracted from classical texts. The results show a drop in the accuracy and coverage and suggest an ensemble method might increase accuracy and coverage for classical Arabic.\",\"PeriodicalId\":402489,\"journal\":{\"name\":\"J. Lang. Technol. Comput. Linguistics\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"27\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Lang. Technol. Comput. Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21248/jlcl.32.2017.212\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Lang. Technol. Comput. Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21248/jlcl.32.2017.212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27

摘要

以古典阿拉伯语为重点,本文的第一部分评估了可免费用于研究目的的形态分析仪和词性标注器,这些分析仪和词性标注器专为现代标准阿拉伯语(MSA)或古典阿拉伯语(CA)设计,能够分析所有形式的单词,并具有学术可信度。我们列出并比较了每种工具支持的功能,以及它们在输出格式、分词、词性标记和形态特征方面的不同之处。我们演示了每个分析器针对一个CA全元音句子的示例输出。这种评估可以作为选择适合研究需要的最佳工具的指南。在第二部分中,我们报告了从经典文本中提取的一组经典阿拉伯语词汇的标记准确性和覆盖范围。结果表明,精度和覆盖率下降,并建议一个集成方法可能提高精度和覆盖率的古典阿拉伯语。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Tagging Classical Arabic Text using Available Morphological Analysers and Part of Speech Taggers
Focusing on Classical Arabic, this paper in its first part evaluates morphological analysers and POS taggers that are available freely for research purposes, are designed for Modern Standard Arabic (MSA) or Classical Arabic (CA), are able to analyse all forms of words, and have academic credibility. We list and compare supported features of each tool, and how they differ in the format of the output, segmentation, Part-of-Speech (POS) tags and morphological features. We demonstrate a sample output of each analyser against one CA fully-vowelized sentence. This evaluation serves as a guide in choosing the best tool that suits research needs. In the second part, we report the accuracy and coverage of tagging a set of classical Arabic vocabulary extracted from classical texts. The results show a drop in the accuracy and coverage and suggest an ensemble method might increase accuracy and coverage for classical Arabic.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信