用于Web文档分层聚类的基于术语的算法

A. Schenker, Mark Last, A. Kandel
{"title":"用于Web文档分层聚类的基于术语的算法","authors":"A. Schenker, Mark Last, A. Kandel","doi":"10.1109/NAFIPS.2001.943719","DOIUrl":null,"url":null,"abstract":"In this paper we introduce the novel class hierarchy construction algorithm (CHCA) in order to create hierarchical clusterings of Web documents. Unlike most clustering methods, CHCA operates on nominal data (the words occurring in each document) and it differs from other hierarchical clustering techniques in that it uses the object-oriented concept of inheritance to create the parent/child relationship between clusters. A prototype system has been developed using CHCA to create cluster hierarchies from web search results returned by conventional search engines. CHCA, without any guidance, creates term-based clusters from the contents of the retrieved pages and assigns each page to a cluster; the clusters correspond to topics and sub-topics in the investigated domain. The performance of our system is compared with a similar web search clustering system (Vivisimo).","PeriodicalId":227374,"journal":{"name":"Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"A term-based algorithm for hierarchical clustering of Web documents\",\"authors\":\"A. Schenker, Mark Last, A. Kandel\",\"doi\":\"10.1109/NAFIPS.2001.943719\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we introduce the novel class hierarchy construction algorithm (CHCA) in order to create hierarchical clusterings of Web documents. Unlike most clustering methods, CHCA operates on nominal data (the words occurring in each document) and it differs from other hierarchical clustering techniques in that it uses the object-oriented concept of inheritance to create the parent/child relationship between clusters. A prototype system has been developed using CHCA to create cluster hierarchies from web search results returned by conventional search engines. CHCA, without any guidance, creates term-based clusters from the contents of the retrieved pages and assigns each page to a cluster; the clusters correspond to topics and sub-topics in the investigated domain. The performance of our system is compared with a similar web search clustering system (Vivisimo).\",\"PeriodicalId\":227374,\"journal\":{\"name\":\"Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NAFIPS.2001.943719\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAFIPS.2001.943719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

本文介绍了一种新的类层次构造算法(CHCA),用于创建Web文档的层次聚类。与大多数聚类方法不同,CHCA对标称数据(每个文档中出现的单词)进行操作,它与其他分层聚类技术的不同之处在于,它使用面向对象的继承概念来创建聚类之间的父/子关系。本文开发了一个原型系统,利用CHCA从传统搜索引擎返回的网络搜索结果中创建集群层次结构。CHCA在没有任何指导的情况下,根据检索页面的内容创建基于术语的集群,并将每个页面分配给一个集群;集群对应于所研究领域的主题和子主题。我们的系统的性能与类似的web搜索聚类系统(Vivisimo)进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A term-based algorithm for hierarchical clustering of Web documents
In this paper we introduce the novel class hierarchy construction algorithm (CHCA) in order to create hierarchical clusterings of Web documents. Unlike most clustering methods, CHCA operates on nominal data (the words occurring in each document) and it differs from other hierarchical clustering techniques in that it uses the object-oriented concept of inheritance to create the parent/child relationship between clusters. A prototype system has been developed using CHCA to create cluster hierarchies from web search results returned by conventional search engines. CHCA, without any guidance, creates term-based clusters from the contents of the retrieved pages and assigns each page to a cluster; the clusters correspond to topics and sub-topics in the investigated domain. The performance of our system is compared with a similar web search clustering system (Vivisimo).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信