{"title":"聚集带有扩展标签的网页","authors":"Li Zhao, Lianhe Yang, Yinghuang Liang","doi":"10.1109/ICCIAUTOM.2011.6183985","DOIUrl":null,"url":null,"abstract":"Social annotations e.g. tags are good descriptors of web page semantics, which have large potential for web document clustering. However, most web pages have few tags. The sparsity seriously affects the clustering performance. To overcome the problem, we incorporate user-related tag context, a specially constructed tag set, to improve the topic representation and estimation for documents. Experimental results demonstrate the nice effect of tag context on addressing the sparsity problem. Compared to clustering based on non-expanded tags, our approach achieves a statistically significant increase of 26.5% to 47.4% on F1 score.","PeriodicalId":177039,"journal":{"name":"2011 2nd International Conference on Control, Instrumentation and Automation (ICCIA)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clustering web pages with expanded tags\",\"authors\":\"Li Zhao, Lianhe Yang, Yinghuang Liang\",\"doi\":\"10.1109/ICCIAUTOM.2011.6183985\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social annotations e.g. tags are good descriptors of web page semantics, which have large potential for web document clustering. However, most web pages have few tags. The sparsity seriously affects the clustering performance. To overcome the problem, we incorporate user-related tag context, a specially constructed tag set, to improve the topic representation and estimation for documents. Experimental results demonstrate the nice effect of tag context on addressing the sparsity problem. Compared to clustering based on non-expanded tags, our approach achieves a statistically significant increase of 26.5% to 47.4% on F1 score.\",\"PeriodicalId\":177039,\"journal\":{\"name\":\"2011 2nd International Conference on Control, Instrumentation and Automation (ICCIA)\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 2nd International Conference on Control, Instrumentation and Automation (ICCIA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIAUTOM.2011.6183985\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 2nd International Conference on Control, Instrumentation and Automation (ICCIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIAUTOM.2011.6183985","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Social annotations e.g. tags are good descriptors of web page semantics, which have large potential for web document clustering. However, most web pages have few tags. The sparsity seriously affects the clustering performance. To overcome the problem, we incorporate user-related tag context, a specially constructed tag set, to improve the topic representation and estimation for documents. Experimental results demonstrate the nice effect of tag context on addressing the sparsity problem. Compared to clustering based on non-expanded tags, our approach achieves a statistically significant increase of 26.5% to 47.4% on F1 score.