{"title":"用于Web文档分层聚类的基于术语的算法","authors":"A. Schenker, Mark Last, A. Kandel","doi":"10.1109/NAFIPS.2001.943719","DOIUrl":null,"url":null,"abstract":"In this paper we introduce the novel class hierarchy construction algorithm (CHCA) in order to create hierarchical clusterings of Web documents. Unlike most clustering methods, CHCA operates on nominal data (the words occurring in each document) and it differs from other hierarchical clustering techniques in that it uses the object-oriented concept of inheritance to create the parent/child relationship between clusters. A prototype system has been developed using CHCA to create cluster hierarchies from web search results returned by conventional search engines. CHCA, without any guidance, creates term-based clusters from the contents of the retrieved pages and assigns each page to a cluster; the clusters correspond to topics and sub-topics in the investigated domain. The performance of our system is compared with a similar web search clustering system (Vivisimo).","PeriodicalId":227374,"journal":{"name":"Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"A term-based algorithm for hierarchical clustering of Web documents\",\"authors\":\"A. Schenker, Mark Last, A. Kandel\",\"doi\":\"10.1109/NAFIPS.2001.943719\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we introduce the novel class hierarchy construction algorithm (CHCA) in order to create hierarchical clusterings of Web documents. Unlike most clustering methods, CHCA operates on nominal data (the words occurring in each document) and it differs from other hierarchical clustering techniques in that it uses the object-oriented concept of inheritance to create the parent/child relationship between clusters. A prototype system has been developed using CHCA to create cluster hierarchies from web search results returned by conventional search engines. CHCA, without any guidance, creates term-based clusters from the contents of the retrieved pages and assigns each page to a cluster; the clusters correspond to topics and sub-topics in the investigated domain. The performance of our system is compared with a similar web search clustering system (Vivisimo).\",\"PeriodicalId\":227374,\"journal\":{\"name\":\"Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NAFIPS.2001.943719\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAFIPS.2001.943719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A term-based algorithm for hierarchical clustering of Web documents
In this paper we introduce the novel class hierarchy construction algorithm (CHCA) in order to create hierarchical clusterings of Web documents. Unlike most clustering methods, CHCA operates on nominal data (the words occurring in each document) and it differs from other hierarchical clustering techniques in that it uses the object-oriented concept of inheritance to create the parent/child relationship between clusters. A prototype system has been developed using CHCA to create cluster hierarchies from web search results returned by conventional search engines. CHCA, without any guidance, creates term-based clusters from the contents of the retrieved pages and assigns each page to a cluster; the clusters correspond to topics and sub-topics in the investigated domain. The performance of our system is compared with a similar web search clustering system (Vivisimo).