{"title":"结合邻域内容和迭代特征选择改进越南语网页聚类","authors":"Le Viet Hung, N. K. Anh, N. H. Dang","doi":"10.1145/2350716.2350726","DOIUrl":null,"url":null,"abstract":"Web page clustering is a fundamental technique to offer a solution for data management, information locating and its interpretation of Web data and to facilitate users for navigation, discrimination and understanding. Most existing clustering algorithms can't adapt well to Web page clustering directly in terms of efficiency and effectiveness due to the problems of high dimensionality and data sparseness. Furthermore, the uncontrolled nature of web content presents additional challenges to web page clustering, whereas the interconnected characteristic of hypertext can provide useful information for the process. To address this problem, we propose a new Web page clustering method with combining neighbors' content to overcome data sparseness and using Iterative Feature Selection to remove noisy and redundant features and to improve the performance of clustering algorithm. Experimental results show that the proposed method significantly improves the performance of the Vietnamese web page clustering with a relatively small number of good descriptive features for web pages.","PeriodicalId":208300,"journal":{"name":"Proceedings of the 3rd Symposium on Information and Communication Technology","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Improving Vietnamese web page clustering by combining neighbors' content and using iterative feature selection\",\"authors\":\"Le Viet Hung, N. K. Anh, N. H. Dang\",\"doi\":\"10.1145/2350716.2350726\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web page clustering is a fundamental technique to offer a solution for data management, information locating and its interpretation of Web data and to facilitate users for navigation, discrimination and understanding. Most existing clustering algorithms can't adapt well to Web page clustering directly in terms of efficiency and effectiveness due to the problems of high dimensionality and data sparseness. Furthermore, the uncontrolled nature of web content presents additional challenges to web page clustering, whereas the interconnected characteristic of hypertext can provide useful information for the process. To address this problem, we propose a new Web page clustering method with combining neighbors' content to overcome data sparseness and using Iterative Feature Selection to remove noisy and redundant features and to improve the performance of clustering algorithm. Experimental results show that the proposed method significantly improves the performance of the Vietnamese web page clustering with a relatively small number of good descriptive features for web pages.\",\"PeriodicalId\":208300,\"journal\":{\"name\":\"Proceedings of the 3rd Symposium on Information and Communication Technology\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd Symposium on Information and Communication Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2350716.2350726\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd Symposium on Information and Communication Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2350716.2350726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Vietnamese web page clustering by combining neighbors' content and using iterative feature selection
Web page clustering is a fundamental technique to offer a solution for data management, information locating and its interpretation of Web data and to facilitate users for navigation, discrimination and understanding. Most existing clustering algorithms can't adapt well to Web page clustering directly in terms of efficiency and effectiveness due to the problems of high dimensionality and data sparseness. Furthermore, the uncontrolled nature of web content presents additional challenges to web page clustering, whereas the interconnected characteristic of hypertext can provide useful information for the process. To address this problem, we propose a new Web page clustering method with combining neighbors' content to overcome data sparseness and using Iterative Feature Selection to remove noisy and redundant features and to improve the performance of clustering algorithm. Experimental results show that the proposed method significantly improves the performance of the Vietnamese web page clustering with a relatively small number of good descriptive features for web pages.