A semi-automatic approach for building ontologies from acollection of structured web documents

Proceedings of the seventh international conference on Knowledge capture Pub Date : 2013-06-23 DOI:10.1145/2479832.2479856

Mouna Kamel, Nathalie Aussenac-Gilles, D. Buscaldi, C. Comparot

引用次数: 7

Abstract

Many collections of structured documents are available on the web. The collection generally describes the characteristics of entities from a single type, where each page describes one entity. These documents are adequate knowledge sources for building ontologies. As they benefit from a strong and shared layout, they contain less well written text than plain text files but their architecture is very meaningful. Classical linguistic-based methods for identifying concepts and relations are no longer appropriate for analyzing them.The approach we propose in this paper exploits various properties of such documents, combining layout/formatting analysis and linguistic analysis, and using semantic annotation.

查看原文本刊更多论文

从结构化web文档的集合中构建本体的半自动方法

web上有许多结构化文档的集合。该集合通常描述来自单一类型的实体的特征，其中每个页面描述一个实体。这些文档是构建本体的充分知识来源。由于它们受益于强大和共享的布局，它们包含的文本比纯文本文件少，但它们的体系结构非常有意义。用于识别概念和关系的经典的基于语言的方法不再适用于分析它们。我们在本文中提出的方法利用了这些文档的各种属性，结合了布局/格式分析和语言分析，并使用了语义注释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the seventh international conference on Knowledge capture

自引率

0.00%

发文量