Querying and clustering Web pages about persons and organizations

Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003) Pub Date : 2003-10-13 DOI:10.1109/WI.2003.1241214

Shiren Ye, Tat-Seng Chua, Jeremy R. Kei

引用次数: 3

Abstract

One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and nonunique. Thus, a single name may be mapped to several entities. We describe a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.

查看原文本刊更多论文

查询和聚集有关人员和组织的Web页面

最常见的网络冲浪任务之一是搜索个人和组织的名称。这样的名字通常是不独特的、经常出现的、非唯一的。因此，单个名称可以映射到多个实体。我们描述了一种将搜索引擎返回的Web页面聚类的方法，以便将属于不同实体的页面聚类到不同的组中。该算法使用命名实体、基于链接和基于结构的信息组合作为特征，使用决策模型将文档集划分为直接和间接页面。然后，它使用不同的直接页面作为种子，将文档集聚到不同的聚类中。该算法已被证明是有效的基于web的应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)

自引率

0.00%

发文量