从文本语料库构建本体

Proceedings of the The International Conference on Engineering & MIS 2015 Pub Date : 2015-09-24 DOI:10.1145/2832987.2833029

Ali Benafia, S. Mazouzi, S. Benafia

{"title":"从文本语料库构建本体","authors":"Ali Benafia, S. Mazouzi, S. Benafia","doi":"10.1145/2832987.2833029","DOIUrl":null,"url":null,"abstract":"This paper presents a novel approach of information extraction for building ontologies covering an extensive range of applications drawn from corpora. Our goal is to propose a method that is independent of domains and based on a distributional analysis of semantic units to bring out all the candidates informative elements (concepts, entities, semantic relations, named entities ...). This method is based on a pipeline of four main stages allowing to refine the extraction information from unstructured text in the form of a suite of decomposable representations (sentences in triplets, \"argumental structure\"...) until to get a consistent final ontology. We applied the pipeline defined in the context of a repeated sampling of 100 articles randomly drawn from text corpus (`Le Monde' with annual version `2013'). For the evaluation results of the trial implementation of our system, we have achieved a level of accuracy at which was up to 74%. We believe from the results obtained that our methodology is quite generic, and can be easily adapted to any new domain.","PeriodicalId":416001,"journal":{"name":"Proceedings of the The International Conference on Engineering & MIS 2015","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Building Ontologies from Text Corpora\",\"authors\":\"Ali Benafia, S. Mazouzi, S. Benafia\",\"doi\":\"10.1145/2832987.2833029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel approach of information extraction for building ontologies covering an extensive range of applications drawn from corpora. Our goal is to propose a method that is independent of domains and based on a distributional analysis of semantic units to bring out all the candidates informative elements (concepts, entities, semantic relations, named entities ...). This method is based on a pipeline of four main stages allowing to refine the extraction information from unstructured text in the form of a suite of decomposable representations (sentences in triplets, \\\"argumental structure\\\"...) until to get a consistent final ontology. We applied the pipeline defined in the context of a repeated sampling of 100 articles randomly drawn from text corpus (`Le Monde' with annual version `2013'). For the evaluation results of the trial implementation of our system, we have achieved a level of accuracy at which was up to 74%. We believe from the results obtained that our methodology is quite generic, and can be easily adapted to any new domain.\",\"PeriodicalId\":416001,\"journal\":{\"name\":\"Proceedings of the The International Conference on Engineering & MIS 2015\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the The International Conference on Engineering & MIS 2015\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2832987.2833029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the The International Conference on Engineering & MIS 2015","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2832987.2833029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文提出了一种新的信息提取方法，用于构建涵盖广泛应用的从语料库中提取的本体。我们的目标是提出一种独立于领域的方法，该方法基于语义单元的分布分析，以提出所有候选信息元素(概念、实体、语义关系、命名实体……)。该方法基于四个主要阶段的管道，允许以一组可分解表示(三元组的句子，“论证结构”……)的形式从非结构化文本中提炼提取信息，直到获得一致的最终本体。我们在从文本语料库(“Le Monde”和年度版“2013”)中随机抽取的100篇文章的重复抽样中应用了定义的管道。对于我们系统试验实施的评估结果，我们已经达到了高达74%的准确率水平。从得到的结果来看，我们相信我们的方法是非常通用的，并且可以很容易地适应任何新的领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Building Ontologies from Text Corpora

This paper presents a novel approach of information extraction for building ontologies covering an extensive range of applications drawn from corpora. Our goal is to propose a method that is independent of domains and based on a distributional analysis of semantic units to bring out all the candidates informative elements (concepts, entities, semantic relations, named entities ...). This method is based on a pipeline of four main stages allowing to refine the extraction information from unstructured text in the form of a suite of decomposable representations (sentences in triplets, "argumental structure"...) until to get a consistent final ontology. We applied the pipeline defined in the context of a repeated sampling of 100 articles randomly drawn from text corpus (`Le Monde' with annual version `2013'). For the evaluation results of the trial implementation of our system, we have achieved a level of accuracy at which was up to 74%. We believe from the results obtained that our methodology is quite generic, and can be easily adapted to any new domain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the The International Conference on Engineering & MIS 2015

自引率

0.00%

发文量