Arabic collocations extraction using Gate

2010 International Conference on Machine and Web Intelligence Pub Date : 2010-11-29 DOI:10.1109/ICMWI.2010.5648038

S. Zaidi, M. Laskri, Ahmed Abdelali

引用次数: 37

Abstract

Information extraction (IE) from corpora is texts analysis in order to extract structured information such as Named Entities (NE) which may be names of person, organization, address, date, location etc. … GATE is a software toolkit written in Java from 1995 and widely used worldwide by many communities (scientists, companies, teachers, students) for natural language processing. We have experimented Gate for extracting terms by writing new Jape rules (Java Annotation Pattern Engine) and used them on a tagged corpus developed at Leeds University. These terms will be used in the texts-based ontologies building. In our case this ontology will be incorporated into a search engine to expand queries on the Web, in the specified domain.

查看原文本刊更多论文

使用Gate提取阿拉伯语搭配

从语料库中提取信息(IE)是文本分析，以提取结构化信息，如命名实体(NE)，可能是人名，组织，地址，日期，位置等. ... GATE是1995年用Java编写的软件工具包，在世界范围内被许多社区(科学家，公司，教师，学生)广泛用于自然语言处理。我们通过编写新的Jape规则(Java注释模式引擎)对Gate进行了提取术语的实验，并在利兹大学开发的标记语料库上使用了它们。这些术语将在基于文本的本体构建中使用。在我们的例子中，这个本体将被合并到一个搜索引擎中，在指定的领域中扩展Web上的查询。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 International Conference on Machine and Web Intelligence

自引率

0.00%

发文量