{"title":"A Class-Based Search System in Unstructured P2P Networks","authors":"Juncheng Huang, Xiuqi Li, Jie Wu","doi":"10.1109/AINA.2007.8","DOIUrl":null,"url":null,"abstract":"Efficient searching is one of the important design issues in peer-to-peer (P2P) networks. Among various searching techniques, semantic-based searching has drawn significant attention recently. Gnutella-like efficient searching system (GES) in the work of Zhu et al. (2005) is such a system. GES derives a node vector, a semantic summary of all of the documents on a node, based on vector space model (VSM). The topology adaptation algorithm and search protocol are then designed according to the similarity between node vectors of different nodes. However, although GES is suitable when the distribution of documents in each node is uniform, it may not be efficient when the distribution is diverse. When there are many categories of documents at each node, the node vector representation may be inaccurate. We extend the idea of GES and present a class-based semantic searching system (CSS). It makes use of a data clustering algorithm, online spherical k-means clustering (OSKM) in the work of Zhang (2005), to cluster all documents on a node into several classes. Each class can be viewed as a virtual node. Virtual nodes are connected through virtual links. As a result, class vector replaces node vector and plays an important role in the class-based topology adaptation and search process, which makes CSS very efficient. Our simulation using the IR benchmark TREC collection demonstrates that CSS outperforms GES in terms of higher recall, higher precision and lower search cost.","PeriodicalId":361109,"journal":{"name":"21st International Conference on Advanced Information Networking and Applications (AINA '07)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st International Conference on Advanced Information Networking and Applications (AINA '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINA.2007.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Efficient searching is one of the important design issues in peer-to-peer (P2P) networks. Among various searching techniques, semantic-based searching has drawn significant attention recently. Gnutella-like efficient searching system (GES) in the work of Zhu et al. (2005) is such a system. GES derives a node vector, a semantic summary of all of the documents on a node, based on vector space model (VSM). The topology adaptation algorithm and search protocol are then designed according to the similarity between node vectors of different nodes. However, although GES is suitable when the distribution of documents in each node is uniform, it may not be efficient when the distribution is diverse. When there are many categories of documents at each node, the node vector representation may be inaccurate. We extend the idea of GES and present a class-based semantic searching system (CSS). It makes use of a data clustering algorithm, online spherical k-means clustering (OSKM) in the work of Zhang (2005), to cluster all documents on a node into several classes. Each class can be viewed as a virtual node. Virtual nodes are connected through virtual links. As a result, class vector replaces node vector and plays an important role in the class-based topology adaptation and search process, which makes CSS very efficient. Our simulation using the IR benchmark TREC collection demonstrates that CSS outperforms GES in terms of higher recall, higher precision and lower search cost.
高效搜索是点对点网络设计的重要问题之一。在各种搜索技术中,基于语义的搜索技术近年来受到了广泛的关注。Zhu et al.(2005)工作中的Gnutella-like efficient searching system (GES)就是这样一个系统。GES基于向量空间模型(VSM)派生出节点向量,即节点上所有文档的语义摘要。然后根据不同节点的节点向量之间的相似度设计拓扑自适应算法和搜索协议。然而,虽然GES适用于文档在每个节点中的分布是均匀的,但当分布是多样化的时,它可能不是有效的。当每个节点上有许多文档类别时,节点向量表示可能是不准确的。我们扩展了语义搜索系统的思想,提出了一个基于类的语义搜索系统。它利用了Zhang(2005)的一种数据聚类算法——在线球面k-均值聚类(OSKM),将节点上的所有文档聚类成几个类。每个类都可以看作是一个虚拟节点。虚拟节点之间通过虚拟链路连接。因此,类向量取代了节点向量,在基于类的拓扑适应和搜索过程中发挥了重要作用,使得CSS非常高效。我们使用IR基准TREC集合进行的模拟表明,CSS在更高的召回率、更高的精度和更低的搜索成本方面优于GES。