{"title":"Applying Semantic Suffix Net to suffix tree clustering","authors":"Jongkol Janruang, S. Guha","doi":"10.1109/DMO.2011.5976519","DOIUrl":null,"url":null,"abstract":"In this paper we consider the problem of clustering snippets returned from search engines. We propose a technique to invoke semantic similarity in the clustering process. Our technique improves on the well-known STC method, which is a highly efficient heuristic for clustering web search results. However, a weakness of STC is that it cannot cluster semantic similar documents. To solve this problem, we propose a new data structure to represent suffixes of a single string, called a Semantic Suffix Net (SSN). A generalized semantic suffix net is created to represent suffixes of a set of strings by using a new operator to partially combine nets. A key feature of this new operator is to find a joint point by using semantic similarity and string matching; net pairs combination then begins at that joint point. This logic causes the number of nodes and branches of a generalized semantic suffix net to decrease. The operator then uses the line of suffix links as a boundary to separate the net. A generalized semantic suffix net is then incorporated into the STC algorithm so that it can cluster semantically similar snippets. Experimental results show that the proposed algorithm improves upon conventional STC.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 3rd Conference on Data Mining and Optimization (DMO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMO.2011.5976519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
In this paper we consider the problem of clustering snippets returned from search engines. We propose a technique to invoke semantic similarity in the clustering process. Our technique improves on the well-known STC method, which is a highly efficient heuristic for clustering web search results. However, a weakness of STC is that it cannot cluster semantic similar documents. To solve this problem, we propose a new data structure to represent suffixes of a single string, called a Semantic Suffix Net (SSN). A generalized semantic suffix net is created to represent suffixes of a set of strings by using a new operator to partially combine nets. A key feature of this new operator is to find a joint point by using semantic similarity and string matching; net pairs combination then begins at that joint point. This logic causes the number of nodes and branches of a generalized semantic suffix net to decrease. The operator then uses the line of suffix links as a boundary to separate the net. A generalized semantic suffix net is then incorporated into the STC algorithm so that it can cluster semantically similar snippets. Experimental results show that the proposed algorithm improves upon conventional STC.