非结构化P2P网络中基于类的搜索系统

Juncheng Huang, Xiuqi Li, Jie Wu
{"title":"非结构化P2P网络中基于类的搜索系统","authors":"Juncheng Huang, Xiuqi Li, Jie Wu","doi":"10.1109/AINA.2007.8","DOIUrl":null,"url":null,"abstract":"Efficient searching is one of the important design issues in peer-to-peer (P2P) networks. Among various searching techniques, semantic-based searching has drawn significant attention recently. Gnutella-like efficient searching system (GES) in the work of Zhu et al. (2005) is such a system. GES derives a node vector, a semantic summary of all of the documents on a node, based on vector space model (VSM). The topology adaptation algorithm and search protocol are then designed according to the similarity between node vectors of different nodes. However, although GES is suitable when the distribution of documents in each node is uniform, it may not be efficient when the distribution is diverse. When there are many categories of documents at each node, the node vector representation may be inaccurate. We extend the idea of GES and present a class-based semantic searching system (CSS). It makes use of a data clustering algorithm, online spherical k-means clustering (OSKM) in the work of Zhang (2005), to cluster all documents on a node into several classes. Each class can be viewed as a virtual node. Virtual nodes are connected through virtual links. As a result, class vector replaces node vector and plays an important role in the class-based topology adaptation and search process, which makes CSS very efficient. Our simulation using the IR benchmark TREC collection demonstrates that CSS outperforms GES in terms of higher recall, higher precision and lower search cost.","PeriodicalId":361109,"journal":{"name":"21st International Conference on Advanced Information Networking and Applications (AINA '07)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"A Class-Based Search System in Unstructured P2P Networks\",\"authors\":\"Juncheng Huang, Xiuqi Li, Jie Wu\",\"doi\":\"10.1109/AINA.2007.8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient searching is one of the important design issues in peer-to-peer (P2P) networks. Among various searching techniques, semantic-based searching has drawn significant attention recently. Gnutella-like efficient searching system (GES) in the work of Zhu et al. (2005) is such a system. GES derives a node vector, a semantic summary of all of the documents on a node, based on vector space model (VSM). The topology adaptation algorithm and search protocol are then designed according to the similarity between node vectors of different nodes. However, although GES is suitable when the distribution of documents in each node is uniform, it may not be efficient when the distribution is diverse. When there are many categories of documents at each node, the node vector representation may be inaccurate. We extend the idea of GES and present a class-based semantic searching system (CSS). It makes use of a data clustering algorithm, online spherical k-means clustering (OSKM) in the work of Zhang (2005), to cluster all documents on a node into several classes. Each class can be viewed as a virtual node. Virtual nodes are connected through virtual links. As a result, class vector replaces node vector and plays an important role in the class-based topology adaptation and search process, which makes CSS very efficient. Our simulation using the IR benchmark TREC collection demonstrates that CSS outperforms GES in terms of higher recall, higher precision and lower search cost.\",\"PeriodicalId\":361109,\"journal\":{\"name\":\"21st International Conference on Advanced Information Networking and Applications (AINA '07)\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"21st International Conference on Advanced Information Networking and Applications (AINA '07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AINA.2007.8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st International Conference on Advanced Information Networking and Applications (AINA '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINA.2007.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

高效搜索是点对点网络设计的重要问题之一。在各种搜索技术中,基于语义的搜索技术近年来受到了广泛的关注。Zhu et al.(2005)工作中的Gnutella-like efficient searching system (GES)就是这样一个系统。GES基于向量空间模型(VSM)派生出节点向量,即节点上所有文档的语义摘要。然后根据不同节点的节点向量之间的相似度设计拓扑自适应算法和搜索协议。然而,虽然GES适用于文档在每个节点中的分布是均匀的,但当分布是多样化的时,它可能不是有效的。当每个节点上有许多文档类别时,节点向量表示可能是不准确的。我们扩展了语义搜索系统的思想,提出了一个基于类的语义搜索系统。它利用了Zhang(2005)的一种数据聚类算法——在线球面k-均值聚类(OSKM),将节点上的所有文档聚类成几个类。每个类都可以看作是一个虚拟节点。虚拟节点之间通过虚拟链路连接。因此,类向量取代了节点向量,在基于类的拓扑适应和搜索过程中发挥了重要作用,使得CSS非常高效。我们使用IR基准TREC集合进行的模拟表明,CSS在更高的召回率、更高的精度和更低的搜索成本方面优于GES。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Class-Based Search System in Unstructured P2P Networks
Efficient searching is one of the important design issues in peer-to-peer (P2P) networks. Among various searching techniques, semantic-based searching has drawn significant attention recently. Gnutella-like efficient searching system (GES) in the work of Zhu et al. (2005) is such a system. GES derives a node vector, a semantic summary of all of the documents on a node, based on vector space model (VSM). The topology adaptation algorithm and search protocol are then designed according to the similarity between node vectors of different nodes. However, although GES is suitable when the distribution of documents in each node is uniform, it may not be efficient when the distribution is diverse. When there are many categories of documents at each node, the node vector representation may be inaccurate. We extend the idea of GES and present a class-based semantic searching system (CSS). It makes use of a data clustering algorithm, online spherical k-means clustering (OSKM) in the work of Zhang (2005), to cluster all documents on a node into several classes. Each class can be viewed as a virtual node. Virtual nodes are connected through virtual links. As a result, class vector replaces node vector and plays an important role in the class-based topology adaptation and search process, which makes CSS very efficient. Our simulation using the IR benchmark TREC collection demonstrates that CSS outperforms GES in terms of higher recall, higher precision and lower search cost.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信