Applying Semantic Suffix Net to suffix tree clustering

Jongkol Janruang, S. Guha
{"title":"Applying Semantic Suffix Net to suffix tree clustering","authors":"Jongkol Janruang, S. Guha","doi":"10.1109/DMO.2011.5976519","DOIUrl":null,"url":null,"abstract":"In this paper we consider the problem of clustering snippets returned from search engines. We propose a technique to invoke semantic similarity in the clustering process. Our technique improves on the well-known STC method, which is a highly efficient heuristic for clustering web search results. However, a weakness of STC is that it cannot cluster semantic similar documents. To solve this problem, we propose a new data structure to represent suffixes of a single string, called a Semantic Suffix Net (SSN). A generalized semantic suffix net is created to represent suffixes of a set of strings by using a new operator to partially combine nets. A key feature of this new operator is to find a joint point by using semantic similarity and string matching; net pairs combination then begins at that joint point. This logic causes the number of nodes and branches of a generalized semantic suffix net to decrease. The operator then uses the line of suffix links as a boundary to separate the net. A generalized semantic suffix net is then incorporated into the STC algorithm so that it can cluster semantically similar snippets. Experimental results show that the proposed algorithm improves upon conventional STC.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 3rd Conference on Data Mining and Optimization (DMO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMO.2011.5976519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

In this paper we consider the problem of clustering snippets returned from search engines. We propose a technique to invoke semantic similarity in the clustering process. Our technique improves on the well-known STC method, which is a highly efficient heuristic for clustering web search results. However, a weakness of STC is that it cannot cluster semantic similar documents. To solve this problem, we propose a new data structure to represent suffixes of a single string, called a Semantic Suffix Net (SSN). A generalized semantic suffix net is created to represent suffixes of a set of strings by using a new operator to partially combine nets. A key feature of this new operator is to find a joint point by using semantic similarity and string matching; net pairs combination then begins at that joint point. This logic causes the number of nodes and branches of a generalized semantic suffix net to decrease. The operator then uses the line of suffix links as a boundary to separate the net. A generalized semantic suffix net is then incorporated into the STC algorithm so that it can cluster semantically similar snippets. Experimental results show that the proposed algorithm improves upon conventional STC.
语义后缀网在后缀树聚类中的应用
本文研究了搜索引擎返回的片段聚类问题。我们提出了一种在聚类过程中调用语义相似度的技术。我们的技术改进了著名的STC方法,这是一种高效的启发式聚类web搜索结果。然而,STC的一个缺点是不能聚类语义相似的文档。为了解决这个问题,我们提出了一种新的数据结构来表示单个字符串的后缀,称为语义后缀网(SSN)。通过使用一个新的运算符对网络进行部分组合,创建了一个广义语义后缀网络来表示一组字符串的后缀。该算子的一个关键特征是利用语义相似度和字符串匹配来寻找结合点;网对组合就从这个连接点开始。这种逻辑导致广义语义后缀网络的节点和分支数量减少。然后,操作员使用后缀链接线作为分隔网的边界。然后在STC算法中加入广义语义后缀网络,使其能够聚类语义相似的片段。实验结果表明,该算法比传统的STC算法有很大的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信