{"title":"A unified framework for clustering heterogeneous Web objects","authors":"Hua-Jun Zeng, Zheng Chen, Wei-Ying Ma","doi":"10.1109/WISE.2002.1181653","DOIUrl":null,"url":null,"abstract":"We introduce a novel framework for clustering Web data which is often heterogeneous in nature. As most existing methods often integrate heterogeneous data into a unified feature space, their flexibilities to explore and adjust contributing effects from different heterogeneous information are compromised. In contrast, our framework enables separate clustering of homogeneous data in the entire process based on their respective features, and a layered structure with link information is used to iteratively project and propagate the clustered results between layers until it converges. Our experimental results show that such a scheme not only effectively overcomes the problem of data sparseness caused by the high dimensional link space but also improves the clustering accuracy significantly. We achieve 19% and 41% performance increases when clustering Web-pages and users based on a semi-synthetic Web log. Finally, we show a real clustering result based on UC Berkeley's Web log.","PeriodicalId":392999,"journal":{"name":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"65","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WISE.2002.1181653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 65
Abstract
We introduce a novel framework for clustering Web data which is often heterogeneous in nature. As most existing methods often integrate heterogeneous data into a unified feature space, their flexibilities to explore and adjust contributing effects from different heterogeneous information are compromised. In contrast, our framework enables separate clustering of homogeneous data in the entire process based on their respective features, and a layered structure with link information is used to iteratively project and propagate the clustered results between layers until it converges. Our experimental results show that such a scheme not only effectively overcomes the problem of data sparseness caused by the high dimensional link space but also improves the clustering accuracy significantly. We achieve 19% and 41% performance increases when clustering Web-pages and users based on a semi-synthetic Web log. Finally, we show a real clustering result based on UC Berkeley's Web log.