基于内容的集距聚类P2P搜索模型

2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops Pub Date : 2006-12-18 DOI:10.1109/WI-IATW.2006.53

Jing Wang, Shoubao Yang

{"title":"基于内容的集距聚类P2P搜索模型","authors":"Jing Wang, Shoubao Yang","doi":"10.1109/WI-IATW.2006.53","DOIUrl":null,"url":null,"abstract":"The main issues that affect query efficiency and search cost in content-based unstructured P2P search system are the complexity of computing the similarity of the documents brought by high dimensions and the great deal of redundant messages coming with flooding. This paper defines the documents similarity by the way of set distance. This method restrains the complexity of computing the document similarity in linear time. Also, this paper clusters the peers based on content by their set distance to reduce the query time and redundant messages. Simulations show that the content-based search model constructed by set distance not only has higher recall, but also reduce the search cost and query time to the rate of 40% and 30% of Gnutella","PeriodicalId":358971,"journal":{"name":"2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Content-Based Clustered P2P Search Model Depending on Set Distance\",\"authors\":\"Jing Wang, Shoubao Yang\",\"doi\":\"10.1109/WI-IATW.2006.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main issues that affect query efficiency and search cost in content-based unstructured P2P search system are the complexity of computing the similarity of the documents brought by high dimensions and the great deal of redundant messages coming with flooding. This paper defines the documents similarity by the way of set distance. This method restrains the complexity of computing the document similarity in linear time. Also, this paper clusters the peers based on content by their set distance to reduce the query time and redundant messages. Simulations show that the content-based search model constructed by set distance not only has higher recall, but also reduce the search cost and query time to the rate of 40% and 30% of Gnutella\",\"PeriodicalId\":358971,\"journal\":{\"name\":\"2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI-IATW.2006.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IATW.2006.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在基于内容的非结构化P2P搜索系统中，影响查询效率和搜索成本的主要问题是高维搜索带来的文档相似度计算的复杂性和洪水泛滥带来的大量冗余消息。本文采用设定距离的方法来定义文档相似度。该方法在线性时间内限制了计算文档相似度的复杂度。此外，本文还根据对等点的设置距离对其进行内容聚类，以减少查询时间和冗余消息。仿真结果表明，通过设置距离构建的基于内容的搜索模型不仅具有较高的查全率，而且将搜索成本和查询时间分别降低到Gnutella的40%和30%

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Content-Based Clustered P2P Search Model Depending on Set Distance

The main issues that affect query efficiency and search cost in content-based unstructured P2P search system are the complexity of computing the similarity of the documents brought by high dimensions and the great deal of redundant messages coming with flooding. This paper defines the documents similarity by the way of set distance. This method restrains the complexity of computing the document similarity in linear time. Also, this paper clusters the peers based on content by their set distance to reduce the query time and redundant messages. Simulations show that the content-based search model constructed by set distance not only has higher recall, but also reduce the search cost and query time to the rate of 40% and 30% of Gnutella

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops

自引率

0.00%

发文量