使用条件随机场对阿拉伯文本进行分组

2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA) Pub Date : 2014-11-01 DOI:10.1109/AICCSA.2014.7073230

Nabil Khoufi, Chafik Aloulou, Lamia Hadrich Belguith

{"title":"使用条件随机场对阿拉伯文本进行分组","authors":"Nabil Khoufi, Chafik Aloulou, Lamia Hadrich Belguith","doi":"10.1109/AICCSA.2014.7073230","DOIUrl":null,"url":null,"abstract":"Chunking or shallow syntactic parsing is proving to be a task of interest to many natural language processing applications. The problem gets worse for the Arabic language because of its specific features that make it quite different and even more ambiguous than other natural languages when processed. In this paper, we present a method for chunking Arabic texts based on supervised learning. We use the Conditional Random Fields algorithm and the Penn Arabic Treebank to train the model. For the experimentation, we use over than 10,100 sentences as training data and 2,524 sentences for the test. The evaluation of the method consists of the calculation of the generated model accuracy and the results are very encouraging.","PeriodicalId":412749,"journal":{"name":"2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Chunking Arabic texts using Conditional Random Fields\",\"authors\":\"Nabil Khoufi, Chafik Aloulou, Lamia Hadrich Belguith\",\"doi\":\"10.1109/AICCSA.2014.7073230\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Chunking or shallow syntactic parsing is proving to be a task of interest to many natural language processing applications. The problem gets worse for the Arabic language because of its specific features that make it quite different and even more ambiguous than other natural languages when processed. In this paper, we present a method for chunking Arabic texts based on supervised learning. We use the Conditional Random Fields algorithm and the Penn Arabic Treebank to train the model. For the experimentation, we use over than 10,100 sentences as training data and 2,524 sentences for the test. The evaluation of the method consists of the calculation of the generated model accuracy and the results are very encouraging.\",\"PeriodicalId\":412749,\"journal\":{\"name\":\"2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICCSA.2014.7073230\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2014.7073230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

对于许多自然语言处理应用程序来说，分块或浅语法解析是一项有趣的任务。阿拉伯语的问题更严重，因为它的特殊特征使它在处理时与其他自然语言截然不同，甚至更加模糊。在本文中，我们提出了一种基于监督学习的阿拉伯语文本分块方法。我们使用条件随机场算法和Penn阿拉伯树库来训练模型。在实验中，我们使用了超过10100个句子作为训练数据，2524个句子用于测试。对该方法的评价包括对生成的模型精度的计算，结果令人鼓舞。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Chunking Arabic texts using Conditional Random Fields

Chunking or shallow syntactic parsing is proving to be a task of interest to many natural language processing applications. The problem gets worse for the Arabic language because of its specific features that make it quite different and even more ambiguous than other natural languages when processed. In this paper, we present a method for chunking Arabic texts based on supervised learning. We use the Conditional Random Fields algorithm and the Penn Arabic Treebank to train the model. For the experimentation, we use over than 10,100 sentences as training data and 2,524 sentences for the test. The evaluation of the method consists of the calculation of the generated model accuracy and the results are very encouraging.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)

自引率

0.00%

发文量