阿拉伯语命名实体识别的有效性

2015 International Conference on Electrical Engineering and Informatics (ICEEI) Pub Date : 2015-12-17 DOI:10.1109/ICEEI.2015.7352553

Suhad Al-Shoukry, N. Omar

{"title":"阿拉伯语命名实体识别的有效性","authors":"Suhad Al-Shoukry, N. Omar","doi":"10.1109/ICEEI.2015.7352553","DOIUrl":null,"url":null,"abstract":"Named entry recognition research is a relatively new field for the Arabic language, although it has reached a mature stage for other languages. As Arabic has more speech sounds than many other languages, there is some lack of uniformity in Arabic writing styles. Transcription can become ambiguous, and the same word can be written in several different ways. Spelling mistakes can arise as a result of this same phenomenon. There are also both long and short vowels in Arabic, which can lead to further ambiguity. In the Arabic world, NER research has typically been of limited capacity or coverage. With this in mind, in this paper, we propose a method for analysing the structure of Arabic named-entity recognition and sentence object recognition by combining prior information and conditional random fields. We present a proposed method that leads to a 2.67% performance improvement per sentence, as compared with existing methods.","PeriodicalId":426454,"journal":{"name":"2015 International Conference on Electrical Engineering and Informatics (ICEEI)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficacy of Arabic named-entity recognition\",\"authors\":\"Suhad Al-Shoukry, N. Omar\",\"doi\":\"10.1109/ICEEI.2015.7352553\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Named entry recognition research is a relatively new field for the Arabic language, although it has reached a mature stage for other languages. As Arabic has more speech sounds than many other languages, there is some lack of uniformity in Arabic writing styles. Transcription can become ambiguous, and the same word can be written in several different ways. Spelling mistakes can arise as a result of this same phenomenon. There are also both long and short vowels in Arabic, which can lead to further ambiguity. In the Arabic world, NER research has typically been of limited capacity or coverage. With this in mind, in this paper, we propose a method for analysing the structure of Arabic named-entity recognition and sentence object recognition by combining prior information and conditional random fields. We present a proposed method that leads to a 2.67% performance improvement per sentence, as compared with existing methods.\",\"PeriodicalId\":426454,\"journal\":{\"name\":\"2015 International Conference on Electrical Engineering and Informatics (ICEEI)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Electrical Engineering and Informatics (ICEEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEEI.2015.7352553\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Electrical Engineering and Informatics (ICEEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEEI.2015.7352553","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

命名条目识别的研究对于阿拉伯语来说是一个相对较新的领域，尽管对于其他语言来说已经达到了成熟的阶段。由于阿拉伯语比许多其他语言有更多的语音，因此阿拉伯语的写作风格缺乏一致性。抄写可能会变得模棱两可，同一个单词可能有几种不同的写法。同样的现象也会导致拼写错误。阿拉伯语中也有长元音和短元音，这可能会导致进一步的歧义。在阿拉伯世界，NER研究的能力或覆盖范围通常有限。有鉴于此，本文提出了一种结合先验信息和条件随机场的阿拉伯语命名实体识别和句子对象识别结构分析方法。与现有方法相比，我们提出的方法每个句子的性能提高了2.67%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficacy of Arabic named-entity recognition

Named entry recognition research is a relatively new field for the Arabic language, although it has reached a mature stage for other languages. As Arabic has more speech sounds than many other languages, there is some lack of uniformity in Arabic writing styles. Transcription can become ambiguous, and the same word can be written in several different ways. Spelling mistakes can arise as a result of this same phenomenon. There are also both long and short vowels in Arabic, which can lead to further ambiguity. In the Arabic world, NER research has typically been of limited capacity or coverage. With this in mind, in this paper, we propose a method for analysing the structure of Arabic named-entity recognition and sentence object recognition by combining prior information and conditional random fields. We present a proposed method that leads to a 2.67% performance improvement per sentence, as compared with existing methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Conference on Electrical Engineering and Informatics (ICEEI)

自引率

0.00%

发文量