阿拉伯语语料库名词复合词自动提取

Abdulgabbar Saif, M. J. Aziz
{"title":"阿拉伯语语料库名词复合词自动提取","authors":"Abdulgabbar Saif, M. J. Aziz","doi":"10.1109/STAIR.2011.5995793","DOIUrl":null,"url":null,"abstract":"The identification of noun compound as multi-word lexical units is very important task in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. In this paper, we used the hybrid method for extracting the noun compound from Arabic corpus that is based on linguistic knowledge and statistical measures. For the candidate identification, we have used some linguistic analysis tools such as lemmatization and POS in order to filter the candidates and determine the variations. The association measures have been computed for each candidate to rank the candidates. After that, we have evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.","PeriodicalId":376671,"journal":{"name":"2011 International Conference on Semantic Technology and Information Retrieval","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An automatic noun compound extraction from Arabic corpus\",\"authors\":\"Abdulgabbar Saif, M. J. Aziz\",\"doi\":\"10.1109/STAIR.2011.5995793\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The identification of noun compound as multi-word lexical units is very important task in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. In this paper, we used the hybrid method for extracting the noun compound from Arabic corpus that is based on linguistic knowledge and statistical measures. For the candidate identification, we have used some linguistic analysis tools such as lemmatization and POS in order to filter the candidates and determine the variations. The association measures have been computed for each candidate to rank the candidates. After that, we have evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.\",\"PeriodicalId\":376671,\"journal\":{\"name\":\"2011 International Conference on Semantic Technology and Information Retrieval\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Semantic Technology and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/STAIR.2011.5995793\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Semantic Technology and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STAIR.2011.5995793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

在机器翻译、信息检索和文本摘要等需要一定程度语义解释的自然语言处理应用中,名词复合词作为多词词汇单位的识别是一项非常重要的任务。本文采用基于语言学知识和统计度量的混合方法从阿拉伯语语料库中提取名词复合词。在候选词识别方面,我们使用了词序化和词序化等语言分析工具来过滤候选词并确定其变化。为每个候选人计算关联度量以对候选人进行排名。然后,我们用n-最优评价方法对关联测度进行了评价。我们报告了每个n-best列表中每个关联度量的精度值。实验结果表明,对数似然比是精度最高的最佳关联度量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An automatic noun compound extraction from Arabic corpus
The identification of noun compound as multi-word lexical units is very important task in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. In this paper, we used the hybrid method for extracting the noun compound from Arabic corpus that is based on linguistic knowledge and statistical measures. For the candidate identification, we have used some linguistic analysis tools such as lemmatization and POS in order to filter the candidates and determine the variations. The association measures have been computed for each candidate to rank the candidates. After that, we have evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信