使用可用的形态分析器和词性标记器标记古典阿拉伯语文本

J. Lang. Technol. Comput. Linguistics Pub Date : 2017-07-01 DOI:10.21248/jlcl.32.2017.212

A. Alosaimy, E. Atwell

{"title":"使用可用的形态分析器和词性标记器标记古典阿拉伯语文本","authors":"A. Alosaimy, E. Atwell","doi":"10.21248/jlcl.32.2017.212","DOIUrl":null,"url":null,"abstract":"Focusing on Classical Arabic, this paper in its first part evaluates morphological analysers and POS taggers that are available freely for research purposes, are designed for Modern Standard Arabic (MSA) or Classical Arabic (CA), are able to analyse all forms of words, and have academic credibility. We list and compare supported features of each tool, and how they differ in the format of the output, segmentation, Part-of-Speech (POS) tags and morphological features. We demonstrate a sample output of each analyser against one CA fully-vowelized sentence. This evaluation serves as a guide in choosing the best tool that suits research needs. In the second part, we report the accuracy and coverage of tagging a set of classical Arabic vocabulary extracted from classical texts. The results show a drop in the accuracy and coverage and suggest an ensemble method might increase accuracy and coverage for classical Arabic.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":"{\"title\":\"Tagging Classical Arabic Text using Available Morphological Analysers and Part of Speech Taggers\",\"authors\":\"A. Alosaimy, E. Atwell\",\"doi\":\"10.21248/jlcl.32.2017.212\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Focusing on Classical Arabic, this paper in its first part evaluates morphological analysers and POS taggers that are available freely for research purposes, are designed for Modern Standard Arabic (MSA) or Classical Arabic (CA), are able to analyse all forms of words, and have academic credibility. We list and compare supported features of each tool, and how they differ in the format of the output, segmentation, Part-of-Speech (POS) tags and morphological features. We demonstrate a sample output of each analyser against one CA fully-vowelized sentence. This evaluation serves as a guide in choosing the best tool that suits research needs. In the second part, we report the accuracy and coverage of tagging a set of classical Arabic vocabulary extracted from classical texts. The results show a drop in the accuracy and coverage and suggest an ensemble method might increase accuracy and coverage for classical Arabic.\",\"PeriodicalId\":402489,\"journal\":{\"name\":\"J. Lang. Technol. Comput. Linguistics\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"27\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Lang. Technol. Comput. Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21248/jlcl.32.2017.212\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Lang. Technol. Comput. Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21248/jlcl.32.2017.212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

摘要

以古典阿拉伯语为重点，本文的第一部分评估了可免费用于研究目的的形态分析仪和词性标注器，这些分析仪和词性标注器专为现代标准阿拉伯语(MSA)或古典阿拉伯语(CA)设计，能够分析所有形式的单词，并具有学术可信度。我们列出并比较了每种工具支持的功能，以及它们在输出格式、分词、词性标记和形态特征方面的不同之处。我们演示了每个分析器针对一个CA全元音句子的示例输出。这种评估可以作为选择适合研究需要的最佳工具的指南。在第二部分中，我们报告了从经典文本中提取的一组经典阿拉伯语词汇的标记准确性和覆盖范围。结果表明，精度和覆盖率下降，并建议一个集成方法可能提高精度和覆盖率的古典阿拉伯语。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Tagging Classical Arabic Text using Available Morphological Analysers and Part of Speech Taggers

Focusing on Classical Arabic, this paper in its first part evaluates morphological analysers and POS taggers that are available freely for research purposes, are designed for Modern Standard Arabic (MSA) or Classical Arabic (CA), are able to analyse all forms of words, and have academic credibility. We list and compare supported features of each tool, and how they differ in the format of the output, segmentation, Part-of-Speech (POS) tags and morphological features. We demonstrate a sample output of each analyser against one CA fully-vowelized sentence. This evaluation serves as a guide in choosing the best tool that suits research needs. In the second part, we report the accuracy and coverage of tagging a set of classical Arabic vocabulary extracted from classical texts. The results show a drop in the accuracy and coverage and suggest an ensemble method might increase accuracy and coverage for classical Arabic.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Lang. Technol. Comput. Linguistics

自引率

0.00%

发文量