Building Ontologies from Text Corpora

Proceedings of the The International Conference on Engineering & MIS 2015 Pub Date : 2015-09-24 DOI:10.1145/2832987.2833029

Ali Benafia, S. Mazouzi, S. Benafia

引用次数: 2

Abstract

This paper presents a novel approach of information extraction for building ontologies covering an extensive range of applications drawn from corpora. Our goal is to propose a method that is independent of domains and based on a distributional analysis of semantic units to bring out all the candidates informative elements (concepts, entities, semantic relations, named entities ...). This method is based on a pipeline of four main stages allowing to refine the extraction information from unstructured text in the form of a suite of decomposable representations (sentences in triplets, "argumental structure"...) until to get a consistent final ontology. We applied the pipeline defined in the context of a repeated sampling of 100 articles randomly drawn from text corpus (`Le Monde' with annual version `2013'). For the evaluation results of the trial implementation of our system, we have achieved a level of accuracy at which was up to 74%. We believe from the results obtained that our methodology is quite generic, and can be easily adapted to any new domain.

查看原文本刊更多论文

从文本语料库构建本体

本文提出了一种新的信息提取方法，用于构建涵盖广泛应用的从语料库中提取的本体。我们的目标是提出一种独立于领域的方法，该方法基于语义单元的分布分析，以提出所有候选信息元素(概念、实体、语义关系、命名实体……)。该方法基于四个主要阶段的管道，允许以一组可分解表示(三元组的句子，“论证结构”……)的形式从非结构化文本中提炼提取信息，直到获得一致的最终本体。我们在从文本语料库(“Le Monde”和年度版“2013”)中随机抽取的100篇文章的重复抽样中应用了定义的管道。对于我们系统试验实施的评估结果，我们已经达到了高达74%的准确率水平。从得到的结果来看，我们相信我们的方法是非常通用的，并且可以很容易地适应任何新的领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the The International Conference on Engineering & MIS 2015

自引率

0.00%

发文量