查询和聚集有关人员和组织的Web页面

Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003) Pub Date : 2003-10-13 DOI:10.1109/WI.2003.1241214

Shiren Ye, Tat-Seng Chua, Jeremy R. Kei

{"title":"查询和聚集有关人员和组织的Web页面","authors":"Shiren Ye, Tat-Seng Chua, Jeremy R. Kei","doi":"10.1109/WI.2003.1241214","DOIUrl":null,"url":null,"abstract":"One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and nonunique. Thus, a single name may be mapped to several entities. We describe a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.","PeriodicalId":403574,"journal":{"name":"Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Querying and clustering Web pages about persons and organizations\",\"authors\":\"Shiren Ye, Tat-Seng Chua, Jeremy R. Kei\",\"doi\":\"10.1109/WI.2003.1241214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and nonunique. Thus, a single name may be mapped to several entities. We describe a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.\",\"PeriodicalId\":403574,\"journal\":{\"name\":\"Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI.2003.1241214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2003.1241214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

最常见的网络冲浪任务之一是搜索个人和组织的名称。这样的名字通常是不独特的、经常出现的、非唯一的。因此，单个名称可以映射到多个实体。我们描述了一种将搜索引擎返回的Web页面聚类的方法，以便将属于不同实体的页面聚类到不同的组中。该算法使用命名实体、基于链接和基于结构的信息组合作为特征，使用决策模型将文档集划分为直接和间接页面。然后，它使用不同的直接页面作为种子，将文档集聚到不同的聚类中。该算法已被证明是有效的基于web的应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Querying and clustering Web pages about persons and organizations

One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and nonunique. Thus, a single name may be mapped to several entities. We describe a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)

自引率

0.00%

发文量