Automatic Taxonomy Extraction Using Google and Term Dependency

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI:10.1109/WI.2007.26

M. Makrehchi, M. Kamel

引用次数: 19

Abstract

An automatic taxonomy extraction algorithm is proposed. Given a set of terms or terminology related to a subject domain, the proposed approach uses Google page count to estimate the dependency links between the terms. A taxonomic link is an asymmetric relation between two concepts. In order to extract these directed links, neither mutual information nor normalized Google distance can be employed. Using the new measure of information theoretic inclusion index, term dependency matrix, which represents the pair-wise dependencies, is obtained. Next, using a proposed algorithm, the dependency matrix is converted into an adjacency matrix, representing the taxonomy tree. In order to evaluate the performance of the proposed approach, it is applied to several domains for taxonomy extraction.

查看原文本刊更多论文

使用谷歌和术语依赖的自动分类法提取

提出了一种自动分类提取算法。给定与主题领域相关的一组术语或术语，建议的方法使用Google页面计数来估计术语之间的依赖链接。分类学上的联系是两个概念之间的不对称关系。为了提取这些定向链接，既不能使用互信息，也不能使用规范化的Google距离。利用信息论包含指数的新度量，得到了表示成对依赖关系的项依赖矩阵。接下来，使用提出的算法，将依赖矩阵转换为邻接矩阵，表示分类树。为了评估该方法的性能，将其应用于多个领域进行分类提取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)

自引率

0.00%

发文量