{"title":"Arabic collocations extraction using Gate","authors":"S. Zaidi, M. Laskri, Ahmed Abdelali","doi":"10.1109/ICMWI.2010.5648038","DOIUrl":null,"url":null,"abstract":"Information extraction (IE) from corpora is texts analysis in order to extract structured information such as Named Entities (NE) which may be names of person, organization, address, date, location etc. … GATE is a software toolkit written in Java from 1995 and widely used worldwide by many communities (scientists, companies, teachers, students) for natural language processing. We have experimented Gate for extracting terms by writing new Jape rules (Java Annotation Pattern Engine) and used them on a tagged corpus developed at Leeds University. These terms will be used in the texts-based ontologies building. In our case this ontology will be incorporated into a search engine to expand queries on the Web, in the specified domain.","PeriodicalId":404577,"journal":{"name":"2010 International Conference on Machine and Web Intelligence","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Machine and Web Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMWI.2010.5648038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37
Abstract
Information extraction (IE) from corpora is texts analysis in order to extract structured information such as Named Entities (NE) which may be names of person, organization, address, date, location etc. … GATE is a software toolkit written in Java from 1995 and widely used worldwide by many communities (scientists, companies, teachers, students) for natural language processing. We have experimented Gate for extracting terms by writing new Jape rules (Java Annotation Pattern Engine) and used them on a tagged corpus developed at Leeds University. These terms will be used in the texts-based ontologies building. In our case this ontology will be incorporated into a search engine to expand queries on the Web, in the specified domain.