{"title":"基于新后缀树的中文上下文搜索结果聚类","authors":"Jiangning Wu, Zhijiang Wang","doi":"10.1109/CIT.2008.WORKSHOPS.63","DOIUrl":null,"url":null,"abstract":"Searching for information by search engines has been gaining popularity in recent years. However, results returned by most Chinese Web search engines usually reach up to thousands or even millions documents, so search results clustering is of critical need for on-line grouping of similar documents to improve user experience while searching collections of Web pages and facilitate browsing Chinese Web pages in a more compact and thematic form. This paper presents a new suffix tree clustering (STC) algorithm for Web search results clustering, which is more suitable for Chinese context. It is built in terms of Chinese words, of which meaningless phrases are ignored by an efficient strategy we proposed. Meanwhile the Chinese synonymy is introduced into the suffix tree to improve the quality of the clusters. Experiments show that the proposed novel STC algorithm has a better performance in precision and speed than original STC.","PeriodicalId":155998,"journal":{"name":"2008 IEEE 8th International Conference on Computer and Information Technology Workshops","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Search Results Clustering in Chinese Context Based on a New Suffix Tree\",\"authors\":\"Jiangning Wu, Zhijiang Wang\",\"doi\":\"10.1109/CIT.2008.WORKSHOPS.63\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Searching for information by search engines has been gaining popularity in recent years. However, results returned by most Chinese Web search engines usually reach up to thousands or even millions documents, so search results clustering is of critical need for on-line grouping of similar documents to improve user experience while searching collections of Web pages and facilitate browsing Chinese Web pages in a more compact and thematic form. This paper presents a new suffix tree clustering (STC) algorithm for Web search results clustering, which is more suitable for Chinese context. It is built in terms of Chinese words, of which meaningless phrases are ignored by an efficient strategy we proposed. Meanwhile the Chinese synonymy is introduced into the suffix tree to improve the quality of the clusters. Experiments show that the proposed novel STC algorithm has a better performance in precision and speed than original STC.\",\"PeriodicalId\":155998,\"journal\":{\"name\":\"2008 IEEE 8th International Conference on Computer and Information Technology Workshops\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE 8th International Conference on Computer and Information Technology Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIT.2008.WORKSHOPS.63\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE 8th International Conference on Computer and Information Technology Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIT.2008.WORKSHOPS.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Search Results Clustering in Chinese Context Based on a New Suffix Tree
Searching for information by search engines has been gaining popularity in recent years. However, results returned by most Chinese Web search engines usually reach up to thousands or even millions documents, so search results clustering is of critical need for on-line grouping of similar documents to improve user experience while searching collections of Web pages and facilitate browsing Chinese Web pages in a more compact and thematic form. This paper presents a new suffix tree clustering (STC) algorithm for Web search results clustering, which is more suitable for Chinese context. It is built in terms of Chinese words, of which meaningless phrases are ignored by an efficient strategy we proposed. Meanwhile the Chinese synonymy is introduced into the suffix tree to improve the quality of the clusters. Experiments show that the proposed novel STC algorithm has a better performance in precision and speed than original STC.